public class StopWordRemoverTokenizer extends Tokenizer
positions, tokenVector
Constructor and Description |
---|
StopWordRemoverTokenizer(java.lang.String stopwordFilename)
Constructor that loads stop words to use when
tokenize method is called; |
Modifier and Type | Method and Description |
---|---|
void |
loadStopWords(java.lang.String stopwordFilename)
Loads words to eliminate when
tokenize method is called |
void |
tokenize(java.util.Iterator<java.lang.String> iterator)
Eliminates tokens specified in the given stop words document
|
void |
tokenize(java.util.Iterator<java.lang.String> tokensIterator,
java.util.Iterator<Pair<java.lang.Integer,java.lang.Integer>> positionsIterator)
Eliminates tokens specified in the given stop words document
|
getPositionsVector, getTokenVector, iterator, position_iterator, printTokens, tokenize
public StopWordRemoverTokenizer(java.lang.String stopwordFilename)
tokenize
method is called;stopwordFilename
- the string filename of the document containing the stop words to use; assumes the document contains lower
case, non-porterized English wordpublic void loadStopWords(java.lang.String stopwordFilename)
tokenize
method is calledstopwordFilename
- the string filename of the document containing the stop words to use; assumes the document contains lower
case, non-porterized English wordpublic void tokenize(java.util.Iterator<java.lang.String> iterator)
public void tokenize(java.util.Iterator<java.lang.String> tokensIterator, java.util.Iterator<Pair<java.lang.Integer,java.lang.Integer>> positionsIterator)