positions, tokenVector
Constructor and Description |
---|
TokenizeFile()
Constructor that initializes an empty string vector of tokenizers to use
|
TokenizeFile(java.lang.String config_file)
Constructor that sets the tokenizers from a configuration file to use on this document.
|
Modifier and Type | Method and Description |
---|---|
void |
addTokenizers(java.lang.String config_file)
Adds tokenizers from an XML file to the list of tokenizers to use on this document
|
java.util.Vector<java.lang.String> |
getTokenizers()
Provides the list of tokenizers to use on this document
|
static void |
main(java.lang.String[] args)
Tokenizes a text file and prints the resulting tokens to the the screen
|
void |
setTokenizers(java.util.Vector<java.lang.String> tokenVec)
Sets the tokenizers to use on this document
|
void |
tokenize(java.lang.String filename,
java.util.Vector<java.lang.String> tokenizerNames)
Splits the given file into tokens and alters or eliminates those tokens based on the vector of tokenizers.
|
getPositionsVector, getTokenVector, iterator, position_iterator, printTokens, tokenize, tokenize, tokenize
public TokenizeFile()
public TokenizeFile(java.lang.String config_file)
config_file
- the string XML filename that specifies these tokenizerspublic void setTokenizers(java.util.Vector<java.lang.String> tokenVec)
tokenVec
- the string vector containing these tokenizerspublic java.util.Vector<java.lang.String> getTokenizers()
public void addTokenizers(java.lang.String config_file)
config_file
- the string XML filename that specifies these tokenizerspublic void tokenize(java.lang.String filename, java.util.Vector<java.lang.String> tokenizerNames)
filename
- the name of the file to be tokenizedtokenizerNames
- the string vector containing the tokenizers to usepublic static void main(java.lang.String[] args)
args
- array of string command line argumentsargs[0]
the filename of the document to split into tokensargs[1]
the filename of the tokenizer configuration file