public class StripMarkupTokenizer extends Tokenizer
tokenVectorMap
Constructor and Description |
---|
StripMarkupTokenizer(java.lang.String keepScript)
Constructor that specifies whether to keep tokens nested inside script tags the default is to
eliminate these tokens
|
Modifier and Type | Method and Description |
---|---|
java.util.List<Token> |
tokenize(java.util.List<Token> tokens)
Eliminates tokens nested inside markup language tags; assumes that tokens have been split by line rather than using
whitespace
|
getTokenVectorMap, iterator, printTokens, tokenize, toString
public StripMarkupTokenizer(java.lang.String keepScript)
keepScript
- the string value (true or false) specifying whether to keep tokens nested inside script
tags