- saveDictionaryXML(String) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Writes this dictionary as a serialized XML object.
- saveScores(String) - Method in class edu.georgetown.gucs.matcher.CompareFingerprint
-
- saveScores(String) - Method in class edu.georgetown.gucs.matcher.ExperimentN2
-
- score - Variable in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
- ScoreFingerprints - Class in edu.georgetown.gucs.matcher
-
Compares two XML fingerprints and provides a score for their degree of similarity.
- ScoreFingerprints(String, String, String) - Constructor for class edu.georgetown.gucs.matcher.ScoreFingerprints
-
Constructor that creates a matcher object and two Base64 encoded fingerprint string objects
- scores - Variable in class edu.georgetown.gucs.matcher.CompareFingerprint
-
- sdHashFilePath - Static variable in class edu.georgetown.gucs.utility.Global
-
Location of file path to SdHash
- SdhashFingerprinter - Class in edu.georgetown.gucs.fingerprinter
-
- SdhashFingerprinter() - Constructor for class edu.georgetown.gucs.fingerprinter.SdhashFingerprinter
-
- SdhashFingerprinter(String) - Constructor for class edu.georgetown.gucs.fingerprinter.SdhashFingerprinter
-
Constructor that loads a dictionary and its tokenizers.
- SdHashFingerprintMatcher - Class in edu.georgetown.gucs.matcher
-
- SdHashFingerprintMatcher() - Constructor for class edu.georgetown.gucs.matcher.SdHashFingerprintMatcher
-
The constructor that is called to inherit the Matcher parent class
- sdtext - package sdtext
-
- serialize(T, ByteArrayOutputStream) - Method in interface edu.georgetown.gucs.bloomfilter.ICompactSerializer
-
Serialize the specified type into the specified DataOutputStream instance.
- serialize(LongBitSet, DataOutputStream) - Static method in class edu.georgetown.gucs.bloomfilter.LongBitSetSerializer
-
- serializer() - Static method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- set(long) - Method in class edu.georgetown.gucs.bloomfilter.LongBitSet
-
Sets the bit at the specified index to true
.
- set(int, int) - Method in class edu.georgetown.gucs.bloomfilter.LongBitSet
-
Sets the bits between from (inclusive) and to (exclusive) to true.
- setA(A) - Method in class edu.georgetown.gucs.utility.Pair
-
Sets the first object in this pair
- setB(B) - Method in class edu.georgetown.gucs.utility.Pair
-
Sets the second object in this pair
- setConfigurations() - Method in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
- setCreatingProgram(String) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Sets the name of the program that created this dictionary
- setDictionary(Dictionary) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets the dictionary for this fingerprinter.
- setDictionary(String) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets the dictionary for this fingerprinter.
- setEndByte(int) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- setFileName(String) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- setFingerprint(Fingerprint) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- setFingerprint(byte[]) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- setMangler(boolean) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Added by evan- removes teh manglerToken
- setMangler(String, Dictionary) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Passes the specified mangler settings and a set of tokens to the mangler for this fingerprinter.
- setManglerRNG(Random) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets a random number generator for the mangler to allow for repeatability.
- setManglerRNG(Random) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
Sets the random number generator to use with a FileManglerTokenizer object in this list
- setMinimumScore(int) - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Sets the minimum score for these two fingerprints to be considered a match.
- setMode(String) - Method in class edu.georgetown.gucs.tokenizers.ChineseFileTokenizer
-
Sets the token creation mode.
- setMode(String) - Method in class edu.georgetown.gucs.tokenizers.FileTokenizer
-
- setOutput(boolean, boolean, boolean) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Specifies which information to display in this fingerprinter's digest XML output
- setPosition(int) - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Sets the position of the token in this dictionary.
- setRNG(Random) - Method in class edu.georgetown.gucs.tokenizers.FileManglerTokenizer
-
Sets the random number generator to use with the manglers that are set in this tokenizer
- setSplitter(String) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets the splitter for this fingerprinter
- setSplitter(String) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
Sets the splitter for this list
- setStartByte(int) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- setTerse() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets this fingerprinter's digest XML output to only display file and fingerprint information
- setTokenizers(TokenizerList) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Loads the tokenizers to use for this dictionary from a TokenizerList object.
- setTokenizers(List<String>) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Sets the tokenizers used by this dictionary; Only works if no tokenizers have already been set or if the given list
contains the same tokenizers as those that have already been set
- setTokenizers(TokenizerList) - Method in class edu.georgetown.gucs.fingerprinter.SdhashFingerprinter
-
Tokenizes a file based on a list of tokenizers that are passed to the function
- setTokenizers(String) - Method in class edu.georgetown.gucs.fingerprinter.SdhashFingerprinter
-
SetTokenizers sets the tokenizer configuration for the input file
- setVerbose() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Sets this fingerprinter's digest XML output to display all available information
- showBloomFilter - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
whether or not the bloomFilterExists
- showBloomFilter - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
whether or not the bloomFilterExists
- showDataSource - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
whether to store data source information in the output for the fingerprint
- showDataSource - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
whether to store data source information in the output for the fingerprint
- showDictionary - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
whether to store the full dictionary used for the fingerprint in the output
- showDictionary - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
whether to store the full dictionary used for the fingerprint in the output
- showDigest - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
whether to store digest information in the output for the fingerprint
- showDigest - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
whether to store digest information in the output for the fingerprint
- size() - Method in class edu.georgetown.gucs.bloomfilter.LongBitSet
-
Returns the number of bits of space actually in use by this
BitSet
to represent bit values.
- size() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the number of tokens in this dictionary
- size() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Provides the size of this bit set
- size() - Method in class edu.georgetown.gucs.utility.FileLister
-
Provides the size of the list of all files
- splitLines() - Method in class edu.georgetown.gucs.tokenizers.ParseTokenizers
-
- Splitter - Class in edu.georgetown.gucs.tokenizers
-
Partitions a set of tokens into multiple pieces
- Splitter(String, List<Token>) - Constructor for class edu.georgetown.gucs.tokenizers.Splitter
-
Constructor that takes a string indicating the type and percent for the splitter and a list of tokens to split
- splitToken(Vector<Token>) - Method in class edu.georgetown.gucs.tokenizers.FileManglerTokenizer
-
- start_byte() - Method in class edu.georgetown.gucs.utility.Token
-
Provides the starting byte location of this token
- stdOutDictionaryXML() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Writes dictionary to standard output
- StripMarkupTokenizer - Class in edu.georgetown.gucs.tokenizers
-
Eliminates tokens nested inside markup language tags; assumes that tokens have been split by line rather than using
whitespace
- StripMarkupTokenizer(String) - Constructor for class edu.georgetown.gucs.tokenizers.StripMarkupTokenizer
-
Constructor that specifies whether to keep tokens nested inside script tags the default is to
eliminate these tokens
- StripPunctuationTokenizer - Class in edu.georgetown.gucs.tokenizers
-
Separates tokens based on punctuation and removes punctuation from tokens
- StripPunctuationTokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.StripPunctuationTokenizer
-
- subset(AntlrBitSet) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Returns whether this bitset is contained within a
- subtractInPlace(AntlrBitSet) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Subtracts the elements of the given antlrBitSet from this bitset in-place (turn off all bits of this bitset that
are in the given antlrBitSet)
- summaryReduce(Vector<Token>) - Method in class edu.georgetown.gucs.tokenizers.FileManglerTokenizer
-
- systemID - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
the fully qualified domain name for the local host IP address
- systemID - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
the fully qualified domain name for the local host IP address