public class Fingerprinter
extends java.lang.Object
mangler
which is used to add errors or noise to
the files being processed. This allows modification of the input dictionary on the fly, so that fingerprinters can be
tested for accuracy over errors.Modifier and Type | Field and Description |
---|---|
protected LongFastBloomFilter |
base64BloomFilter
the bloom filter stored in the fingerprintFile
|
protected java.util.List<Fingerprint> |
base64Fingerprints
the Base64 encoded string containing the fingerprint
|
protected LongFastBloomFilter |
bloomFilter
the bloom filter that stores each token to determine a words presence
|
protected java.lang.String |
bloomFilterSize
the size of the bloomFilter stored in XML
|
protected java.lang.String |
byteRun
the byte run for the fingerprinted document
|
protected java.lang.String |
creatingProgram
the name of the program that created the fingerprint
|
protected java.lang.String |
creator
the person or organization that created this fingerprint
|
protected Dictionary |
dictionary
the dictionary to use to create this fingerprinter
|
protected java.lang.String |
diskImage
the disk image identifier where the fingerprinted document is stored
|
protected java.lang.String |
fileName |
protected java.lang.String |
fingerprintName
the name of this fingerprinter
|
protected java.lang.String |
GUID
the globally unique identifier for this fingerprinter
|
protected java.lang.String |
manglerName
the name of the current mangler, if any *
|
protected boolean |
manglerOn |
protected boolean |
showBloomFilter
whether or not the bloomFilterExists
|
protected boolean |
showDataSource
whether to store data source information in the output for the fingerprint
|
protected boolean |
showDictionary
whether to store the full dictionary used for the fingerprint in the output
|
protected boolean |
showDigest
whether to store digest information in the output for the fingerprint
|
protected java.lang.String |
systemID
the fully qualified domain name for the local host IP address
|
protected java.lang.String |
targetFile
the filename of the document this fingerprint is for
|
protected java.util.List<java.lang.String> |
unknownTokens
the list of unknown tokens contained in this fingerprint
|
protected int |
version
the version number of this fingerprint
|
protected java.lang.String |
volume
the volume where the fingerprinted document is stored
|
Constructor and Description |
---|
Fingerprinter()
Constructor that generates the fingerprint name, version, unique identifier (GUID), system identifier, and creating
program.
|
Fingerprinter(Dictionary dict)
Constructor that loads a dictionary and its tokenizers.
|
Fingerprinter(java.lang.String dictionaryFilename)
Constructor that loads a dictionary and its tokenizers.
|
Fingerprinter(java.lang.String fingerprintFilename,
java.lang.String dictionaryFilename)
Loads a previously generated fingerprinter digest from fingerprinter and dictionary XML files.
|
Modifier and Type | Method and Description |
---|---|
LongFastBloomFilter |
addBloomFilter(java.util.List<Token> tokenList)
Create a bloomFilter using a set of tokens
|
java.util.List<Fingerprint> |
computeFingerprint(java.lang.String filename)
Computes the fingerprint of this document as a byte array; indicates the presence or absence of each token in this
dictionary
|
java.util.List<Fingerprint> |
computeFingerprint(TokenizerList list,
java.lang.String str)
Computes the fingerprint of this document as a byte array; indicates the presence or absence of each token in this
dictionary
|
java.util.List<Fingerprint> |
computeFingerprintXML(java.lang.String filename)
Computes the fingerprint of a document as a Base64 encoded string; indicates the presence or absence of each token
in this dictionary
|
java.lang.String |
generateCreatingProgram()
Determines the program that created this fingerprinter
|
org.jdom.Document |
generateXML(java.util.List<Fingerprint> fingerprintList,
java.lang.String fileName) |
java.lang.String |
generateXML(java.util.List<Fingerprint> fingerprintPairList,
java.lang.String document,
java.lang.String outputFile)
Generates an XML file that contains this fingerprinter's digest
|
java.util.List<Fingerprint> |
getBase64Fingerprints()
Gives this fingerprinter's fingerprint in Base64 encoding
|
Dictionary |
getDictionary()
Gives this fingerprinter's dictionary
|
java.lang.String |
getFileName() |
java.lang.String |
getFingerprintName()
Gives this fingerprinter's name.
|
java.lang.String |
getMangler()
Returns the string of the current mangler setting
|
void |
outputFields(java.lang.String config_file)
Sets the digest output to generate for this fingerprinter
|
void |
setDictionary(Dictionary dict)
Sets the dictionary for this fingerprinter.
|
void |
setDictionary(java.lang.String filename)
Sets the dictionary for this fingerprinter.
|
void |
setMangler(boolean manglerOn)
Added by evan- removes teh manglerToken
|
void |
setMangler(java.lang.String settings,
Dictionary dictionary)
Passes the specified mangler settings and a set of tokens to the mangler for this fingerprinter.
|
void |
setManglerRNG(java.util.Random random)
Sets a random number generator for the mangler to allow for repeatability.
|
void |
setOutput(boolean showDigest,
boolean showDictionary,
boolean showDataSource)
Specifies which information to display in this fingerprinter's digest XML output
|
void |
setSplitter(java.lang.String splitter)
Sets the splitter for this fingerprinter
|
void |
setTerse()
Sets this fingerprinter's digest XML output to only display file and fingerprint information
|
void |
setVerbose()
Sets this fingerprinter's digest XML output to display all available information
|
protected Dictionary dictionary
protected java.util.List<Fingerprint> base64Fingerprints
protected java.lang.String fingerprintName
protected java.lang.String GUID
protected java.lang.String systemID
protected java.lang.String creatingProgram
protected java.lang.String creator
protected java.lang.String targetFile
protected int version
protected java.lang.String diskImage
protected java.lang.String volume
protected java.lang.String byteRun
protected java.util.List<java.lang.String> unknownTokens
protected boolean showDigest
protected boolean showDataSource
protected boolean showDictionary
protected boolean showBloomFilter
protected java.lang.String manglerName
protected LongFastBloomFilter bloomFilter
protected LongFastBloomFilter base64BloomFilter
protected java.lang.String bloomFilterSize
protected boolean manglerOn
protected java.lang.String fileName
public Fingerprinter()
public Fingerprinter(Dictionary dict)
dict
- the dictionary to use to create this fingerprinterpublic Fingerprinter(java.lang.String dictionaryFilename)
dictionaryFilename
- the filename containing the dictionary to use for this fingerprinterpublic Fingerprinter(java.lang.String fingerprintFilename, java.lang.String dictionaryFilename)
fingerprintFilename
- the string name of the XML file containing the fingerprinter's digestdictionaryFilename
- the name of the XML file containing the dictionarypublic java.lang.String getFileName()
public void setMangler(boolean manglerOn)
manglerOn
- sets the mangler statuspublic void setOutput(boolean showDigest, boolean showDictionary, boolean showDataSource)
showDigest
- the boolean value specifying whether to display digest informationshowDataSource
- the boolean value specifying whether to display data source informationshowDictionary
- the boolean value specifying whether to display dictionary informationpublic void setTerse()
public void setVerbose()
public void setMangler(java.lang.String settings, Dictionary dictionary)
settings
- the string of mangler settings for this fingerprinterdictionary
- the dictionary to use to pass dictionary tokens to the manglerpublic java.lang.String getMangler()
public void setManglerRNG(java.util.Random random)
random
- the random object to use for this dictionarypublic void setSplitter(java.lang.String splitter)
splitter
- the string containing the splitter to use; null if not splitting filespublic java.lang.String getFingerprintName()
public java.lang.String generateCreatingProgram()
public java.util.List<Fingerprint> computeFingerprint(java.lang.String filename)
filename
- the string filename of the document to fingerprintpublic java.util.List<Fingerprint> computeFingerprint(TokenizerList list, java.lang.String str) throws java.io.IOException
list
- a tokenizer list that will be used to create a fingerprint from a stringstr
- the document content textjava.io.IOException
- if the fingerprint cannot be computedpublic java.util.List<Fingerprint> computeFingerprintXML(java.lang.String filename)
filename
- the string filename of the document to fingerprintpublic void setDictionary(Dictionary dict)
dict
- the dictionary to use for this fingerprinterpublic void setDictionary(java.lang.String filename)
filename
- the string filename of the dictionary to use for this fingerprinterpublic org.jdom.Document generateXML(java.util.List<Fingerprint> fingerprintList, java.lang.String fileName)
public java.lang.String generateXML(java.util.List<Fingerprint> fingerprintPairList, java.lang.String document, java.lang.String outputFile)
fingerprintPairList
- the list of Pairs containing the Base64 encoded string containing the fingerprint and the start and end
byte information for splitter filesdocument
- the string filename of the document this fingerprint is foroutputFile
- the string XML filename to write the digest topublic void outputFields(java.lang.String config_file)
config_file
- the string XML filename containing the output specificationspublic LongFastBloomFilter addBloomFilter(java.util.List<Token> tokenList)
tokenList
- the list of tokens used to create a bloom filterpublic Dictionary getDictionary()
public java.util.List<Fingerprint> getBase64Fingerprints()