public class OutsideInFileTokenizer extends FileTokenizer
mode_
tokenVectorMap
Constructor and Description |
---|
OutsideInFileTokenizer()
Constructor that sets the token creation mode to split based on whitespace
|
Modifier and Type | Method and Description |
---|---|
protected java.util.List<Token> |
readFile(java.lang.String filename)
Calls Oracle's Outside In API to extract text from a file and splits that text into tokens
|
addTokenizers, setMode, tokenize, tokenizeFile
getTokenVectorMap, iterator, printTokens, tokenize, toString
public OutsideInFileTokenizer()
protected java.util.List<Token> readFile(java.lang.String filename)
readFile
in class FileTokenizer
filename
- the string filename of the document to split into tokens