Fall 2005

Clay Shields

front | classes | research | personal | contact

Project 5 - Forensic Hash Analysis, Linked List Style

Assigned: November 28th, 2005
Program source code due: December 7th, 2005

Forensic Hash Analysis, Linked List Style

Computer geeks are lazy, but in a good way - they generally try to avoid unnecessary work. In computer forensics, they do this by using an automated process to find files that they already know about. One mechanism commonly used to do this is call hash analysis.

Simply put, a hash is a function that takes some data in the form of bits and returns a fixed-length string that is dependent on the data. Just to confuse things a little, the output of a hash function is also commonly called a hash. The cool thing about hash functions is that it is incredibly rare for two different bunches of bits to have the same hash. It is also insanely difficult to find data that will match a given hash output. There are some ways to find two different pieces of data that do produce the same hash output, though.

For a hash analysis, the examiner will develop a library of interesting hashes, or download databases if known hashes, such as the one created by NIST. They will then compute the hash of some interesting or suspect files from the case they are working on, and compare those with the known hashes to identify ones that match.

Your job for this project will be to write a program to help perform hash analyses. Because we have some specific stuff to learn, namely using functions and vectors, I am going to make you do a few specific things in your program, as described below. For this project, the files we are working with will have a specific format. The hash files will start with three lines of comments describing what the hashes are; each of these lines starts with a # sign. . All following lines of the file will have a hash, one to each line. An example is shown below.

#This file contains hashes of hacking tools
#Last updated 10/21/2005
#Clay Shields
The other type of files we will be working with will be a file that contains a list of file names and hashes. These are the files we want to compare to our hash set. They also have 3 lines of comments preceding them. A small example of this is:
# These are hashes of the
# ~clay/classes/f05/071/projects/p3 directory
# 10/21/2005 Clay Shields
a.out 6cf502e4a3b2a92334b43c7ebbd5adec
data_file 1938010c6305308c0ca8d8b0d8dc4969
hashes 8eb379c256416aa5c12a72ba39162101
hashes.cc 256ae342b7a3c47e44f42b86c06ff39c
known_hashes 0972956ce73314bec610b69efc9870f0
p3.html 28ae7f13bb9c0148dc746be63bfc9b23

Program Requirements

Wow, the same project again. I normally don't do this, but because time is short, it is the best way to get you to learn some new things without breaking your will to live in the process. Once again we are doing the same project, only this time using linked lists for keeping track of individual files and hashes within the case_file class. Again, this means you have much less code to write, because I am going to give you the outline of what the objects will look like; you will just need to fill in the correct methods. You don't even need a new main! To help you out, I have placed a .cc file in my account that you can copy over and use. It contains all the text that appears below, so that you can just use and edit it instead of typing everything over again. To get this file into your account, type:

cp ~clay/list-hashes-blank.cc ./

Just like last time, I am going to give you a series of steps to get to the end. I know that sometimes it is frustrating to get stuck and that it is tempting to move on, but it can be much worse to debug lots of code all at once. I really recommend doing it step-by-step and getting help at each step if needed.

Step 1

First, we are going to build an object that will hold the file information. This will be a class called file. All it really contains is a name, a hash, and a pointer to the next file.

class file{
  // add friend function reading and writing
  friend ostream &operator<<(ostream&, file);
  friend istream &operator>>(istream&, file &);

  file(string, string);
  void set_name(string);
  string get_name();
  void set_hash(string);
  string get_hash();
  void set_next_file(file *);
  file * get_next_file();
  string filename;
  string hash;
  file * next_file;


// Default constructor for the file class


// Constructor to allow setting initial values for name 
// and hash
file::file(string initial_name, string initial_hash){


// Set the name of the file
void file::set_name(string new_name){


// Find out what the name of the file is
string file::get_name(){


// Set the value of the hash
void file::set_hash(string new_hash){


// Find the value of the hash
string file::get_hash(){


// Set the next file
void file::set_next_file(file * new_file){


// Get the next file
file * file::get_next_file(){


// Desctructor - nothing really needed here


// overload the output operator
ostream &operator<<(ostream &output, file f) {

// overload the input operator
istream &operator>>(istream &input, file &f){

Your first task is to complete the file class and to make sure that it works. You should use the main function below to do so.

// Main for testing the file class                                              
int main (){                                                                    
  file * test = NULL;                                                           
  test = new file();                                                            
  cout << "File name should be 'Test' and it is: " << test->get_name() << endl; 
  cout << "Hash should be 9eb8c6d611097c8fba484d399d7d9e97 and it is: "         
       << test->get_hash() << endl;                                             
  // Make sure the friend functions work                                        
  cout << "The next two lines should be the same: " << endl;                    
  cout << "Test 9eb8c6d611097c8fba484d399d7d9e97" << endl;                      
  cout << *test << endl;                                                        
  cout << "Type the two words 'foo bar' and hit enter:";                        
  cin >> *test;                                                                 
  cout << "The next line should say 'foo bar':" << endl;                        
  cout << *test << endl;                                                        
  // Make sure pointer operations work                                          
  file * foo = new file();                                                      
  cout << "The next file value should be " << foo << " and it is: " <<          
    test->get_next_file() << endl;                                              
Step 2

Now that the file class is working, we can rewrite our case_file class to us it. The prime thing to remember is that the two vectors for the file names and hashes will be gone, replaced with the single linked list of file objects. The class definition changes a little, and now looks like this:

class case_file{
  void load_case_file();
  void print_file_names();
  void print_file_hashes();
  void print_comments();
  void print_case_file();
  int number_of_files();
  void add_file(string, string);
  int find_hash_matches(hash_set);

  file * file_list;
  void add_file_to_list(file *);
  vector comments;

// Constructor for the case file. Needed to make sure things
// are properly initialized. Woe happens if pointers are random.


// This method askes for a file name, then opens the file, reads the
// comments from the first three lines and the file names and hashes
// after that. It stores the comment results in order in the comments
// vector, and the hashes in a linked list of file objects
void case_file::load_case_file(){


// This method prints the comments from the file neatly
// on the screen
void case_file::print_comments(){


// This method prints the names of the files neatly
// on the screen
void case_file::print_file_names(){

// This method prints the hashes of the files neatly
// on the screen
void case_file::print_file_hashes(){


// This methos prints the entire case file on the screen, 
// comments first, and then traverse the file list and use cout
// to print each file object
void case_file::print_case_file(){


// this method returns the number of files in the case file
int case_file::number_of_files(){

// This method takes a file name and a file hash as parameters
// and then creates a new file object with those values and
// adds it to the file list.
void case_file::add_file(string new_name, string new_hash){

// This method takes a vector of strings that contains hashes as
// a parameter. It then finds all case hashes that match, and prints those
// to the screen. It returns the total number of matches.

int case_file::find_hash_matches(hash_set known_hashes){


// This is a private function that will add a pointer to a file to the
// file list. I add it to the head   of the list
void case_file::add_file_to_list(file * new_file){


Since the class definition hasn't changed, we can use the same main from Project 4 to test it, and the same one for menuing. They are included in the file you can copy over.
As with the last projects, you can copy my solution to your gusun account and play with it as needed. To do this, type:

cp ~clay/list-hashes ./

You can also copy over the small sample case file or the sample hash set by typing the following two commands:

cp ~clay/case ./
cp ~clay/hash_set ./

What to turn in

Include the following header in your source code.

	    // Project 5
	    // Name: <your name>
	    // E-mail: <your e-mail address>
	    // COSC 071
	    // In accordance with the class policies and Georgetown's
	    // Honor Code, I certify that I have neither given nor
	    // received any assistance  on this project with the
	    // exceptions of the lecture notes and those  items noted
	    // below.
	    // Description: <Describe your program>

You will submit your source code using the submit program. This is the .cc file. Do not submit the compiled version! I don't speak binary very well.

To submit your program, make sure there is a copy of the source code on your account on gusun. You may name your program what you like - let's assume that it is called hashes.cc. To submit your program electronically, use the submit program like we did in Homework 2 and Project 1, 2, and 3, but with the command:

submit -a p5 -f listhashes.cc