|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Objectedu.georgetown.gucs.experiment.DuplicateFileFinderWorker
public class DuplicateFileFinderWorker
Allows for threading the duplicateFileFinder
Constructor Summary | |
---|---|
DuplicateFileFinderWorker(Dictionary dictionary,
java.io.File source,
java.util.List<java.lang.String> tokenizerNames,
double min,
double max,
DuplicateFileFinder parent)
Constructor that sets all necessary information |
Method Summary | |
---|---|
void |
run()
Tokenizes the file and stores duplicate information |
java.lang.String |
SHA1(java.util.Iterator<java.lang.String> tokens)
Provides a hexadecimal string representation for SHA1 hash of a set of tokens |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public DuplicateFileFinderWorker(Dictionary dictionary, java.io.File source, java.util.List<java.lang.String> tokenizerNames, double min, double max, DuplicateFileFinder parent)
dictionary
- the dictionary of terms to usesource
- the file source to duplicatetokenizerNames
- the list of tokenizer strings to use on the source filemin
- the double
minimum normalized IDF to keep; any token with a lower IDF will be discardedmax
- the double
maximum normalized IDF to keep; any token with a higher IDF will be discardedparent
- the duplicateFileFinder objectMethod Detail |
---|
public void run()
run
in interface java.lang.Runnable
public java.lang.String SHA1(java.util.Iterator<java.lang.String> tokens)
tokens
- the string iterator over the tokens
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |