public class TokenizerList extends Tokenizer
tokenVectorMap
Constructor and Description |
---|
TokenizerList()
Constructor that initializes an empty list of tokenizers
|
TokenizerList(java.util.List<java.lang.String> tokenizerNames)
Constructor that takes a list of the tokenizer names
|
Modifier and Type | Method and Description |
---|---|
void |
addTokenizer(java.lang.String tokenizerName)
Adds a tokenizer to the end of this list of tokenizers.
|
void |
disableMangler()
Disables the mangler in the FileManglerTokenizer object in this list
|
void |
enableMangler(java.lang.String settings,
java.util.List<Token> tokens)
Enables the mangler with the given settings in the FileManglerTokenizer object in this list
|
java.util.List<java.lang.String> |
getNames()
Provides the ordered list of the tokenizer names in this list
|
java.util.List<Tokenizer> |
getTokenizers() |
void |
replaceFileTokenizer(java.lang.String strTokenizer) |
void |
setManglerRNG(java.util.Random random)
Sets the random number generator to use with a FileManglerTokenizer object in this list
|
void |
setSplitter(java.lang.String splitter)
Sets the splitter for this list
|
java.util.Map<java.lang.String,java.util.List<Token>> |
tokenizeFile(java.lang.String filename)
Applies each tokenizer from this list, in order, on the file; the first tokenizer must be able to read from a file
and the tokenizers must already be instantiated
|
java.util.Map<java.lang.String,java.util.List<Token>> |
tokenizeString(java.lang.String str) |
java.lang.String |
toString() |
getTokenVectorMap, iterator, printTokens, tokenize, tokenize
public TokenizerList()
public TokenizerList(java.util.List<java.lang.String> tokenizerNames)
tokenizerNames
- the string names of the tokenizers to use; the first tokenizer must be able to read from a filepublic void enableMangler(java.lang.String settings, java.util.List<Token> tokens)
settings
- the string containing the mangler settingstokens
- the string list containing the tokens to put through this manglerpublic void disableMangler()
public void setManglerRNG(java.util.Random random)
random
- the random number generator to use if there is a fileManglerTokenizer object in this listpublic void setSplitter(java.lang.String splitter)
splitter
- the string containing the splitter to use; null if not splitting filespublic void addTokenizer(java.lang.String tokenizerName)
tokenizerName
- the string name of the tokenizer to be added to the end of this listpublic java.util.List<Tokenizer> getTokenizers()
public void replaceFileTokenizer(java.lang.String strTokenizer)
strTokenizer
- name of tokenizer that replaces the fileTokenizerpublic java.util.List<java.lang.String> getNames()
public java.util.Map<java.lang.String,java.util.List<Token>> tokenizeFile(java.lang.String filename)
filename
- the string name of the file to tokenizepublic java.util.Map<java.lang.String,java.util.List<Token>> tokenizeString(java.lang.String str)