public class TikaFileTokenizer extends FileTokenizer
mode_
tokenVectorMap
Constructor and Description |
---|
TikaFileTokenizer()
Constructor that sets the token creation mode to split based on whitespace
|
Modifier and Type | Method and Description |
---|---|
java.util.List<Token> |
readFile(java.lang.String fileName)
readfile extends FileTokenizer to
extract text from non-traditional text files
|
addTokenizers, setMode, tokenize, tokenizeFile
getTokenVectorMap, iterator, printTokens, tokenize, toString
public TikaFileTokenizer()
public java.util.List<Token> readFile(java.lang.String fileName)
readFile
in class FileTokenizer
fileName
- the file to extract text from