- takeFile() - Method in class edu.georgetown.gucs.utility.FileLister
-
Retrieves and removes the head of the file queue, waiting if necessary until an element becomes available
- targetFile - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
-
the filename of the document this fingerprint is for
- targetFile - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
-
the filename of the document this fingerprint is for
- threadingOn(int) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Turns on threading for creating this dictionary.
- TikaFileTokenizer - Class in edu.georgetown.gucs.tokenizers
-
- TikaFileTokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.TikaFileTokenizer
-
Constructor that sets the token creation mode to split based on whitespace
- toArray() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Provides an array of integers
- toBytes() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Converts this bitset to an array of bytes
- toHex() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Converts this bitset to a hexadecimal string
- Token - Class in edu.georgetown.gucs.utility
-
Provides the token parameters, including the name and byte information
- Token(String, int, int) - Constructor for class edu.georgetown.gucs.utility.Token
-
Constructor that sets all attributes of a token
- Token(String) - Constructor for class edu.georgetown.gucs.utility.Token
-
Constructor that sets all attributes of a token
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.ArabicFileTokenizer
-
Splits the document into tokens.
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.ChineseFileTokenizer
-
Splits a Chinese or mixed Chinese-English document into tokens based on the token creation mode.
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.FileManglerTokenizer
-
Alters or eliminates certain tokens based on the given mangler settings
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.FileTokenizer
-
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.GzippedFileTokenizer
-
Splits a gzipped text document into tokens
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.MaximumLengthTokenizer
-
Eliminates tokens that are longer than the length specified in the constructor
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.MemoryTokenizer
-
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.MinimumLengthTokenizer
-
Eliminates tokens that are shorter than the length specified in the constructor
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.PorterTokenizer
-
Changes English language tokens into their root form
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.RemoveNumericTokensTokenizer
-
Eliminates tokens that are numbers
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.RemoveTokensWithNumbersTokenizer
-
Eliminates tokens containing numbers
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.StripMarkupTokenizer
-
Eliminates tokens nested inside markup language tags; assumes that tokens have been split by line rather than using
whitespace
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.StripPunctuationTokenizer
-
Separates tokens based on punctuation and removes punctuation from tokens
- tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
-
Alters or eliminates certain tokens.
- tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
-
Splits the document into tokens.
- tokenizeEachEntry(List<Tokenizer>) - Method in class edu.georgetown.gucs.tokenizers.Splitter
-
Tokenizes each splitter entry with the list of tokenizers; assumes no FileTokenizers are in the list
- tokenizeFile(String) - Method in class edu.georgetown.gucs.tokenizers.FileTokenizer
-
- tokenizeFile(String) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
Applies each tokenizer from this list, in order, on the file; the first tokenizer must be able to read from a file
and the tokenizers must already be instantiated
- Tokenizer - Class in edu.georgetown.gucs.tokenizers
-
- Tokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.Tokenizer
-
- TokenizerList - Class in edu.georgetown.gucs.tokenizers
-
- TokenizerList() - Constructor for class edu.georgetown.gucs.tokenizers.TokenizerList
-
Constructor that initializes an empty list of tokenizers
- TokenizerList(List<String>) - Constructor for class edu.georgetown.gucs.tokenizers.TokenizerList
-
Constructor that takes a list of the tokenizer names
- tokenizeString(String) - Method in class edu.georgetown.gucs.tokenizers.MemoryTokenizer
-
- tokenizeString(String) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
- tokenVectorMap - Variable in class edu.georgetown.gucs.tokenizers.Tokenizer
-
The map containing the splitter name and its corresponding list of token elements
- toPackedArray() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Creates and returns a copy of this antlrBitSet's bitset
- toString() - Method in class edu.georgetown.gucs.fingerprinter.BitVectorFingerprinter
-
A string representation of the BitVectorFingerprinter
- toString() - Method in class edu.georgetown.gucs.matcher.CosineSimilarityFingerprintMatcher
-
- toString() - Method in class edu.georgetown.gucs.matcher.SdHashFingerprintMatcher
-
- toString() - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
-
- toString() - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
-
- toString() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Transform a bit set into a string separated by commas by formatting each element as an integer
- toString(String) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Transform a bit set into a string by formatting each element as an integer
- toString(String, List<String>) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Creates a string representation where instead of integer elements, the ith element of a list of strings is
displayed
- toString() - Method in class edu.georgetown.gucs.utility.FileLister
-
Returns a list of each of the files in this fileLister, each on a separate line
- toString() - Method in class edu.georgetown.gucs.utility.Pair
-
Provides a string representation of the pair in the form (a, b)
- toString() - Method in class edu.georgetown.gucs.utility.Token
-
- toStringOfHalfWords() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Dumps a comma-separated list of the words making up the bit set; Splits each 64 bit number into two more manageable
32 bit numbers; Generates a comma-separated list of C++-like unsigned long constants
- toStringOfWords() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
-
Provides a comma-separated list of Java-like long int constants that make up the bit set
- trim(String) - Method in class edu.georgetown.gucs.dictionary.TrimDictionary
-
Trims the dictionary and saves it to the given output file
- trimByIDF(double, double) - Method in class edu.georgetown.gucs.dictionary.Dictionary
-
Trims this dictionary by removing any token that is outside a range of normalized IDFs.
- TrimDictionary - Class in edu.georgetown.gucs.dictionary
-
Creates a
Dictionary containing tokens with IDFs within a specified range
- TrimDictionary(double, double, String) - Constructor for class edu.georgetown.gucs.dictionary.TrimDictionary
-
Constructor that specifies the IDF range and dictionary to trim
- TRUE_NEGATIVE - Static variable in class edu.georgetown.gucs.utility.Global
-
Location of true-negative count in array of experiment/trial results
- TRUE_POSITIVE - Static variable in class edu.georgetown.gucs.utility.Global
-
Location of true-positive count in array of experiment/trial results