Skip navigation links
A B C D E F G H I L M N O P R S T U V W X 

T

takeFile() - Method in class edu.georgetown.gucs.utility.FileLister
Retrieves and removes the head of the file queue, waiting if necessary until an element becomes available
targetFile - Variable in class edu.georgetown.gucs.configurations.FingerprintConfiguration
the filename of the document this fingerprint is for
targetFile - Variable in class edu.georgetown.gucs.fingerprinter.Fingerprinter
the filename of the document this fingerprint is for
threadingOn(int) - Method in class edu.georgetown.gucs.dictionary.Dictionary
Turns on threading for creating this dictionary.
TikaFileTokenizer - Class in edu.georgetown.gucs.tokenizers
 
TikaFileTokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.TikaFileTokenizer
Constructor that sets the token creation mode to split based on whitespace
toArray() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Provides an array of integers
toBytes() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Converts this bitset to an array of bytes
toHex() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Converts this bitset to a hexadecimal string
Token - Class in edu.georgetown.gucs.utility
Provides the token parameters, including the name and byte information
Token(String, int, int) - Constructor for class edu.georgetown.gucs.utility.Token
Constructor that sets all attributes of a token
Token(String) - Constructor for class edu.georgetown.gucs.utility.Token
Constructor that sets all attributes of a token
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.ArabicFileTokenizer
Splits the document into tokens.
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.ChineseFileTokenizer
Splits a Chinese or mixed Chinese-English document into tokens based on the token creation mode.
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.FileManglerTokenizer
Alters or eliminates certain tokens based on the given mangler settings
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.FileTokenizer
 
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.GzippedFileTokenizer
Splits a gzipped text document into tokens
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.MaximumLengthTokenizer
Eliminates tokens that are longer than the length specified in the constructor
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.MemoryTokenizer
 
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.MinimumLengthTokenizer
Eliminates tokens that are shorter than the length specified in the constructor
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.PorterTokenizer
Changes English language tokens into their root form
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.RemoveNumericTokensTokenizer
Eliminates tokens that are numbers
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.RemoveTokensWithNumbersTokenizer
Eliminates tokens containing numbers
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.StripMarkupTokenizer
Eliminates tokens nested inside markup language tags; assumes that tokens have been split by line rather than using whitespace
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.StripPunctuationTokenizer
Separates tokens based on punctuation and removes punctuation from tokens
tokenize(List<Token>) - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
Alters or eliminates certain tokens.
tokenize(String) - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
Splits the document into tokens.
tokenizeEachEntry(List<Tokenizer>) - Method in class edu.georgetown.gucs.tokenizers.Splitter
Tokenizes each splitter entry with the list of tokenizers; assumes no FileTokenizers are in the list
tokenizeFile(String) - Method in class edu.georgetown.gucs.tokenizers.FileTokenizer
 
tokenizeFile(String) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
Applies each tokenizer from this list, in order, on the file; the first tokenizer must be able to read from a file and the tokenizers must already be instantiated
Tokenizer - Class in edu.georgetown.gucs.tokenizers
 
Tokenizer() - Constructor for class edu.georgetown.gucs.tokenizers.Tokenizer
 
TokenizerList - Class in edu.georgetown.gucs.tokenizers
An ordered list of edu.georgetown.gucs.tokenizers objects to split a document into tokens and alter the tokens in various ways.
TokenizerList() - Constructor for class edu.georgetown.gucs.tokenizers.TokenizerList
Constructor that initializes an empty list of tokenizers
TokenizerList(List<String>) - Constructor for class edu.georgetown.gucs.tokenizers.TokenizerList
Constructor that takes a list of the tokenizer names
tokenizeString(String) - Method in class edu.georgetown.gucs.tokenizers.MemoryTokenizer
 
tokenizeString(String) - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
 
tokenVectorMap - Variable in class edu.georgetown.gucs.tokenizers.Tokenizer
The map containing the splitter name and its corresponding list of token elements
toPackedArray() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Creates and returns a copy of this antlrBitSet's bitset
toString() - Method in class edu.georgetown.gucs.fingerprinter.BitVectorFingerprinter
A string representation of the BitVectorFingerprinter
toString() - Method in class edu.georgetown.gucs.matcher.CosineSimilarityFingerprintMatcher
 
toString() - Method in class edu.georgetown.gucs.matcher.SdHashFingerprintMatcher
 
toString() - Method in class edu.georgetown.gucs.tokenizers.Tokenizer
 
toString() - Method in class edu.georgetown.gucs.tokenizers.TokenizerList
 
toString() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Transform a bit set into a string separated by commas by formatting each element as an integer
toString(String) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Transform a bit set into a string by formatting each element as an integer
toString(String, List<String>) - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Creates a string representation where instead of integer elements, the ith element of a list of strings is displayed
toString() - Method in class edu.georgetown.gucs.utility.FileLister
Returns a list of each of the files in this fileLister, each on a separate line
toString() - Method in class edu.georgetown.gucs.utility.Pair
Provides a string representation of the pair in the form (a, b)
toString() - Method in class edu.georgetown.gucs.utility.Token
 
toStringOfHalfWords() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Dumps a comma-separated list of the words making up the bit set; Splits each 64 bit number into two more manageable 32 bit numbers; Generates a comma-separated list of C++-like unsigned long constants
toStringOfWords() - Method in class edu.georgetown.gucs.utility.AntlrBitSet
Provides a comma-separated list of Java-like long int constants that make up the bit set
trim(String) - Method in class edu.georgetown.gucs.dictionary.TrimDictionary
Trims the dictionary and saves it to the given output file
trimByIDF(double, double) - Method in class edu.georgetown.gucs.dictionary.Dictionary
Trims this dictionary by removing any token that is outside a range of normalized IDFs.
TrimDictionary - Class in edu.georgetown.gucs.dictionary
Creates a Dictionary containing tokens with IDFs within a specified range
TrimDictionary(double, double, String) - Constructor for class edu.georgetown.gucs.dictionary.TrimDictionary
Constructor that specifies the IDF range and dictionary to trim
TRUE_NEGATIVE - Static variable in class edu.georgetown.gucs.utility.Global
Location of true-negative count in array of experiment/trial results
TRUE_POSITIVE - Static variable in class edu.georgetown.gucs.utility.Global
Location of true-positive count in array of experiment/trial results
A B C D E F G H I L M N O P R S T U V W X 
Skip navigation links