Project 3 - Forensic Hash Analysis
Assigned: October 24th, 2005
Forensic Hash Analysis
Updated - Oct 30, 9:30 PM
Computer geeks are lazy, but in a good way - they generally try to avoid unnecessary work. In computer forensics, they do this by using an automated process to find files that they already know about. One mechanism commonly used to do this is call hash analysis.
Simply put, a hash is a function that takes some data in the form of bits and returns a fixed-length string that is dependent on the data. Just to confuse things a little, the output of a hash function is also commonly called a hash. The cool thing about hash functions is that it is incredibly rare for two different bunches of bits to have the same hash. It is also insanely difficult to find data that will match a given hash output. There are some ways to find two different pieces of data that do produce the same hash output, though.
For a hash analysis, the examiner will develop a library of interesting hashes, or download databases if known hashes, such as the one created by NIST. They will then compute the hash of some interesting or suspect files from the case they are working on, and compare those with the known hashes to identify ones that match.
Your job for this project will be to write a program to help perform hash analyses. Because we have some specific stuff to learn, namely using functions and vectors, I am going to make you do a few specific things in your program, as described below. For this project, the files we are working with will have a specific format. The hash files will start with three lines of comments describing what the hashes are; each of these lines starts with a # sign. . All following lines of the file will have a hash, one to each line. An example is shown below.
#This file contains hashes of hacking toolsThe other type of files we will be working with will be a file that contains a list of file names and hashes. These are the files we want to compare to our hash set. They also have 3 lines of comments preceding them. A small example of this is:
# These are hashes of the
Program RequirementsSince we are learning about functions, you will be required to use functions in you program. Since each function is its own little algorithm, it is best not to try and develop them all at once. Instead, you should develop and test each separately, and then combine them later. These instructions show you what you need to do, and provides some suggestions for this stepwise development.
Step 1First, we need some functions to help us load the hashvectors and make sure they were loaded correctly. The prototypes for these functions are:
// Print out the contents of a vector of stringsOnce you have written those, you can test them with this main function:
Step 2Now we need to write some functions to help us read the file that has the filenames and hashes. We will call this the case file. The prototype for these functions are:
// Prints the vector of files and the vectorOnce you have written those, you should use the main below to make sure that they both work by using this main function:
Step 3Now that we can load the hashes and the case file, we need to write a function that will do the comparisons. What we want is a function that will go through all the hashes loaded from the case, find any that match a hash loaded from the hashes file, and print out the matches to the screen. We will also count how many matches there are and return that value. Below is a function prototype for that function. Notice that it
Step 4Now we just have to write a main function that will let us do all this as a menu. You should read characters, sort them using a switch statement, and then call the right functions as needed. You can see my solution on line for how to do this, by following the directions below.
ResourcesAs with the last projects, you can copy my solution to your gusun account and play with it as needed. To do this, type:
cp ~clay/hashes ./
cp ~clay/case ./
What to turn inImportant: Your output and input should be very similar to that shown in the example program. Please ask for the input in exactly the same order shown and only request the same items shown - do not ask for any other input. This will assist in grading your program.
Include the following header in your source code.
// // Project 3 // Name: <your name> // E-mail: <your e-mail address> // COSC 071 // // In accordance with the class policies and // Georgetown's Honor Code, I certify that I // have neither given nor received any assistance // on this project with the exceptions of the // lecture notes and those // items noted below. // // Description: <Describe your program> //
You will submit your
To submit your program, make sure there is a copy of the source code on your account on gusun. You may name your program what you like - let's assume that it is called hashes.cc. To submit your program electronically, use the submit program like we did in Homework 2 and Project 1 and 2, but with the command:
submit -a p3 -f hashes.cc
AdviceNotice that I am not requiring a design document. Woohoo for you! But now you have the chance to really dig a hole for yourself by waiting until too late to get started. So, the start early rule still applies. Start early!
Second, even though I don't require a design document, you still need to think about the design before you start coding. Coding as you design is the path to unhappiness, sleeplessness, frustration, and a 50% chance of scattered program bugs and falling grades. Think before you write! When you ask me or the TAs for help, the first thing we are going to ask is: what is your algorithm? Have one.