public class GzippedFileTokenizer extends FileTokenizer
mode_
tokenVectorMap
Constructor and Description |
---|
GzippedFileTokenizer()
Constructor that sets the token creation mode to split based on whitespace
|
GzippedFileTokenizer(java.lang.String mode)
Constructor that sets the token creation mode.
|
Modifier and Type | Method and Description |
---|---|
protected java.util.List<Token> |
readFile(java.lang.String filename)
Splits a gzipped text document into tokens.
|
java.util.List<Token> |
tokenize(java.lang.String filename)
Splits a gzipped text document into tokens
|
addTokenizers, setMode, tokenizeFile
getTokenVectorMap, iterator, printTokens, tokenize, toString
public GzippedFileTokenizer()
public GzippedFileTokenizer(java.lang.String mode)
mode
- the string mode to split tokens based on whitespace or by linepublic java.util.List<Token> tokenize(java.lang.String filename)
tokenize
in class FileTokenizer
filename
- the filename of the document to split into tokensprotected java.util.List<Token> readFile(java.lang.String filename)
readFile
in class FileTokenizer
filename
- the string filename of the document to split into tokens