Package | Description |
---|---|
edu.georgetown.gucs.tokenizers |
Modifier and Type | Class and Description |
---|---|
class |
ArabicFileTokenizer
Splits contents of an Arabic text file into tokens using Apache Lucene's ArabicAnalyzer.
|
class |
ChineseFileTokenizer
Splits contents of a Chinese or Chinese-English text file into tokens using Apache Lucene's
ChineseAnalyzer or SmartChineseAnalyzer.
|
class |
GzippedFileTokenizer
Splits contents of a gzipped text file into tokens based on whitespace or by line.
|
class |
OutsideInFileTokenizer
Uses Oracle's OutsideIn technology to extract text from a file.
|
class |
ParseTokenizers |
class |
TikaFileTokenizer |