- generateCreatingProgram() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Determines the program that created this fingerprinter
- generateXML(List<Fingerprint>, String) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
- generateXML(List<Fingerprint>, String, String) - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Generates an XML file that contains this fingerprinter's digest
- get(long) - Method in class edu.georgetown.gucs.bloomfilter.LongBitSet
-
Returns the value of the bit with the specified index.
- getA() - Method in class edu.georgetown.gucs.utility.Pair
-
Provides the first object in this pair
- getB() - Method in class edu.georgetown.gucs.utility.Pair
-
Provides the second object in this pair
- getBase64Fingerprint() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getBase64Fingerprints() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Gives this fingerprinter's fingerprint in Base64 encoding
- getBitSetSize() - Method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- getConfigSpec() - Method in class sdtext.Arguments
-
- getCreatingProgram() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the program that created this dictionary
- getCreation() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the original creation date of this dictionary
- getCurrentFalsePositiveProbability() - Method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
Returns the current false positive probability of the bloom filter based on how many
elements have been added to the filter.
- getCurrentNumberOfElements() - Method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- getDebugSpec() - Method in class sdtext.Arguments
-
- getDictionary() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Gives this fingerprinter's dictionary
- getDictionaryFilename() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the filename of this dictionary, if this dictionary was loaded from or saved to a file
- getDictionaryName() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the name of this dictionary
- getDictionarySpec() - Method in class sdtext.Arguments
-
- getDictNumTerms() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getDirectory() - Method in class edu.georgetown.gucs.utility.FileLister
-
Provides the path to the directory used for the initial list of files
- getDirectorySpec() - Method in class sdtext.Arguments
-
- getDocumentCount() - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Provides the number of documents this token appears in
- getEndByte() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getFileListerArray() - Method in class edu.georgetown.gucs.utility.FileLister
-
- getFileListerList() - Method in class edu.georgetown.gucs.utility.FileLister
-
- getFileName() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getFileName() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
- getFileQueue() - Method in class edu.georgetown.gucs.utility.FileLister
-
- getFilter(long, double) - Static method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- getFingerprint() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getFingerprintName() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Gives this fingerprinter's name.
- getFingerprintSpec() - Method in class sdtext.Arguments
-
- getFrequencyCount() - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Provides the frequency count for this entry
- getGUID() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the unique identifier for this dictionary
- getIDF() - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Provides the IDF of this token
- getInputSpec() - Method in class sdtext.Arguments
-
- getLoaded() - Method in class edu.georgetown.gucs.utility.FileLister
-
Provides the boolean indicating if all files have been loaded into the linked blocking queue
- getLongBitSet() - Method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- getMangler() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
Returns the string of the current mangler setting
- getMatch() - Method in class edu.georgetown.gucs.matcher.ScoreFingerprints
-
Depending on the matcher, returns string representations of either the boolean match or integer similarity of these
two fingerprints
- getMatcherName() - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Provides the name of the matcher used to determine in the two fingerprints have matching documents
- getMatcherSpec() - Method in class sdtext.Arguments
-
- getMaxIDF() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the largest IDF value in this dictionary
- getMaxIDFFound() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the largest IDF value in this dictionary
- getMaxIdfSpec() - Method in class sdtext.Arguments
-
- getMaxThread() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the maximum thread count to use for this dictionary
- getMinBitArraySize(long, double) - Static method in class edu.georgetown.gucs.bloomfilter.BloomFilterCalculations
-
Returns the minimum bit array size (m) to satisfy the desired false
positive probability based on the number of elements expected for the
bloom filter.
- getMinIDF() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the largest IDF value in this dictionary
- getMinIdfSpec() - Method in class sdtext.Arguments
-
- getMinimumScore() - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Provides the minimum score for these two fingerprints to be considered a match.
- getMinScore() - Method in class sdtext.Arguments
-
- getNames() - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
Provides the ordered list of the tokenizer names in this list
- getNormalizedIDF(Token) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the normalized inverse document frequency (IDF) of the given token in this dictionary.
- getNormalizedIDF(DictionaryEntry) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the normalized inverse document frequency (IDF) of the given dictionaryEntry.
- getNumHashFunctions() - Method in class edu.georgetown.gucs.bloomfilter.LongFastBloomFilter
-
- getOutputSpec() - Method in class sdtext.Arguments
-
- getPosition(Token) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the position of the given token in this dictionary
- getPosition() - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Provides the position of the token in this dictionary.
- getProperty(String) - Static method in class edu.georgetown.gucs.utility.Global
-
returns the property specified by ANT build
- getScore(byte[], byte[]) - Method in class edu.georgetown.gucs.matcher.CosineSimilarityFingerprintMatcher
-
Determines the cosine similarity score for these two fingerprints
- getScore(Fingerprint, Fingerprint) - Method in class edu.georgetown.gucs.matcher.CosineSimilarityFingerprintMatcher
-
- getScore(byte[], byte[]) - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Determines a similarity score for these two fingerprints; must be over-ridden in each specific matcher
- getScore() - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Determines a similarity score for these two fingerprints;
over-ridden in GoogleAllPairs
- getScore(String, String) - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
- getScore(Fingerprint, Fingerprint) - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
- getScore(String, String) - Method in class edu.georgetown.gucs.matcher.SdHashFingerprintMatcher
-
Takes two Lists that are of type Pair where the fingerprint is stored in byte form
- getScoreXML(String, String) - Method in class edu.georgetown.gucs.matcher.FingerprintMatcher
-
Determines a similarity score for these two fingerprints.
- getSdhash() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getSource() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the directory used for this dictionary
- getStartByte() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getStatistics() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides this dictionary statistics, including:
Name
Number of documents
Number of tokens
If this dictionary has been trimmed (including the IDF range, if trimmed)
Minimum and maximum IDF
List of tokenizers
- getSublists(double, double) - Method in class edu.georgetown.gucs.utility.FileLister
-
Populates the dictionary and sample lists, waiting for all files to be loaded before randomly assigning files to
one of the lists.
- getSystemID() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the identifier for the system that this dictionary was created on
- getToken() - Method in class edu.georgetown.gucs.dictionary.DictionaryEntry
-
Provides the token
- getTokenizerNames() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the names of the tokenizers
- getTokenizers() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the tokenizerList object used to create this dictionary
- getTokenizers() - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
- getTokenList() - Method in class edu.georgetown.gucs.fingerprinter.Fingerprint
-
- getTokens() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides a vector of the tokens in this dictionary
- getTokenVectorMap() - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
-
Provides the list of each token in order of its appearance
- getTotalDocuments() - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Provides the number of documents processed for this dictionary
- Global - Class in edu.georgetown.gucs.utility
-
Global information
- Global() - Constructor for class edu.georgetown.gucs.utility.Global
-
- growToInclude(int) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Grows the set to a larger number of bits
- GUID - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
the globally unique identifier for this fingerprinter
- GUID - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
the globally unique identifier for this fingerprinter
- GzippedFileTokenizer - Class in edu.georgetown.gucs.tokenizers
-
Splits contents of a gzipped text file into tokens based on whitespace or by line.
- GzippedFileTokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.GzippedFileTokenizer
-
Constructor that sets the token creation mode to split based on whitespace
- GzippedFileTokenizer(String) - Constructor for class edu.georgetown.gucs.tokenizers.GzippedFileTokenizer
-
Constructor that sets the token creation mode.