edu.georgetown.gucs.experiment
Class DuplicateFileFinderWorker

java.lang.Object
  extended by edu.georgetown.gucs.experiment.DuplicateFileFinderWorker
All Implemented Interfaces:
java.lang.Runnable

public class DuplicateFileFinderWorker
extends java.lang.Object
implements java.lang.Runnable

Allows for threading the duplicateFileFinder

Author:
Clay Shields

Constructor Summary
DuplicateFileFinderWorker(Dictionary dictionary, java.io.File source, java.util.List<java.lang.String> tokenizerNames, double min, double max, DuplicateFileFinder parent)
          Constructor that sets all necessary information
 
Method Summary
 void run()
          Tokenizes the file and stores duplicate information
 java.lang.String SHA1(java.util.Iterator<java.lang.String> tokens)
          Provides a hexadecimal string representation for SHA1 hash of a set of tokens
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

DuplicateFileFinderWorker

public DuplicateFileFinderWorker(Dictionary dictionary,
                                 java.io.File source,
                                 java.util.List<java.lang.String> tokenizerNames,
                                 double min,
                                 double max,
                                 DuplicateFileFinder parent)
Constructor that sets all necessary information

Parameters:
dictionary - the dictionary of terms to use
source - the file source to duplicate
tokenizerNames - the list of tokenizer strings to use on the source file
min - the double minimum normalized IDF to keep; any token with a lower IDF will be discarded
max - the double maximum normalized IDF to keep; any token with a higher IDF will be discarded
parent - the duplicateFileFinder object
Method Detail

run

public void run()
Tokenizes the file and stores duplicate information

Specified by:
run in interface java.lang.Runnable

SHA1

public java.lang.String SHA1(java.util.Iterator<java.lang.String> tokens)
Provides a hexadecimal string representation for SHA1 hash of a set of tokens

Parameters:
tokens - the string iterator over the tokens
Returns:
the string hash value
See Also:
Java simple class to compute SHA-1 hash