Class | Description |
---|---|
Dictionary |
Creates a list of unique tokens extracted from a collection of documents that can be trimmed by removing tokens based
on various different attributes; used for creating fingerprints of documents that are based on words that appear in a
document collection
|
DictionaryEntry |
Statistics kept on a per-token basis within a Dictionary
|
DictionaryWorker |
Thread for processing a file to add to a Dictionary
|
ShowDictionaryStatistics |
Prints statistics for a Dictionary:
Name
Language
Number of documents
Number of tokens
If the dictionary has been trimmed (including the IDF range, if trimmed)
Minimum and maximum IDF
List of tokenizers
|
ShowDictionaryTokens |
Prints all the tokens and their frequencies, IDFs, and normalized IDFs for a Dictionary
|
TrimDictionary |
Creates a Dictionary containing tokens with IDFs within a specified range
|