Fall 2005

Clay Shields


front | classes | research | personal | contact

Project 4 - Forensic Hash Analysis, Object Style

Assigned: October 24th, 2005
Program source code due: November 7th, 2005

Forensic Hash Analysis, Object Style


Computer geeks are lazy, but in a good way - they generally try to avoid unnecessary work. In computer forensics, they do this by using an automated process to find files that they already know about. One mechanism commonly used to do this is call hash analysis.

Simply put, a hash is a function that takes some data in the form of bits and returns a fixed-length string that is dependent on the data. Just to confuse things a little, the output of a hash function is also commonly called a hash. The cool thing about hash functions is that it is incredibly rare for two different bunches of bits to have the same hash. It is also insanely difficult to find data that will match a given hash output. There are some ways to find two different pieces of data that do produce the same hash output, though.

For a hash analysis, the examiner will develop a library of interesting hashes, or download databases if known hashes, such as the one created by NIST. They will then compute the hash of some interesting or suspect files from the case they are working on, and compare those with the known hashes to identify ones that match.

Your job for this project will be to write a program to help perform hash analyses. Because we have some specific stuff to learn, namely using functions and vectors, I am going to make you do a few specific things in your program, as described below. For this project, the files we are working with will have a specific format. The hash files will start with three lines of comments describing what the hashes are; each of these lines starts with a # sign. . All following lines of the file will have a hash, one to each line. An example is shown below.

#This file contains hashes of hacking tools
#Last updated 10/21/2005
#Clay Shields
f4d5d0c0671be202bc241807c243e80b
9a8ad92c50cae39aa2c5604fd0ab6d8c
8eb379c256416aa5c12a72ba39162101
60b725f10c9c85c70d97880dfe8191b3
b0dcfcd427768ddc863f4cd34fb01de9
9ffbf43126e33be52cd2bf7e01d627f9
9e0b5b3061054e88eb8dd0053a00d245
The other type of files we will be working with will be a file that contains a list of file names and hashes. These are the files we want to compare to our hash set. They also have 3 lines of comments preceding them. A small example of this is:
# These are hashes of the
# ~clay/classes/f05/071/projects/p3 directory
# 10/21/2005 Clay Shields
a.out 6cf502e4a3b2a92334b43c7ebbd5adec
data_file 1938010c6305308c0ca8d8b0d8dc4969
hashes 8eb379c256416aa5c12a72ba39162101
hashes.cc 256ae342b7a3c47e44f42b86c06ff39c
known_hashes 0972956ce73314bec610b69efc9870f0
p3.html 28ae7f13bb9c0148dc746be63bfc9b23

Program Requirements

Ok, as you can see from the above, we are doing the exact same project as before, only this time with objects. This means you have much less code to write, because you can use either your own old code, or my posted solution code. Additionally, I am going to give you the outline of what the objects will look like; you will just need to fill in the correct methods and construct a new main with a proper menu. To help you out, I have placed a .cc file in my account that you can copy over and use. It contains all the text that appears below, so that you can just use and edit it instead of typing everything over again. To get this file into your account, type:

cp ~clay/obj-hashes-blank.cc ./

Just like last time, I am going to give you a series of steps to get to the end. I know that sometimes it is frustrating to get stuck and that it is tempting to move on, but it can be much worse to debug lots of code all at once. I really recommend doing it step-by-step and getting help at each step if needed.

Step 1

First, we are going to build an object that will hold the known hashes. We are going to expand it a little from last time, by keeping the comments at the head of the file and providing a method that will tell how many hashes are in the known hash set. The object looks like this:


class hash_set {

public:
  void load_hashes();
    void print_hashes();
  void print_hash_set();
  int number_of_hashes();
  void print_comments();
  void add_hash(string);
  vector<string> get_hashes();

private:
  vector<string> comments;
  vector<string> hashes;

};


// This method askes for a file name, then opens the file,
// reads the comments from the first three lines and the
// hashes after that. It stores the comment results in order
// in the comments vector, and the hashes in any order in the 
// hashes vector. Any old values already in the vectors are 
// overwritten.
void hash_set::load_hashes(){
//Fill this in
};

// This method prints the hashes neatly on the screen
void hash_set::print_hashes(){
//Fill this in
};

// This method prints the comments from the file neatly
// on the screen
void hash_set::print_comments(){
//Fill this in
};


// This method prints the whole hash set out
// Comments first, and then the hashes
void hash_set::print_hash_set(){
//Fill this in  
}


// This method returns an integer that is the number
// of hashes in the hash set
int hash_set::number_of_hashes(){
//Fill this in 
};

// This method adds the hash given as a parameter to the hash set
void hash_set::add_hash(string new_hash){
//Fill this in 
};

// This method returns the vector containing the hashes  
vector<string> hash_set::get_hashes(){  
//Fill this in 
};
    


Your first task is to fill in the code above. To make sure it works, test it with the main function below. This function is commented out in the file provided with C style comments. Note that looking at these main functions is a good way to see how the objects work.



///////////////////////////////
//
//  Main for testing the hash_set class
//
//

int main(){
  
  string newhash = "3f508486d0b740c8e15a5770e0f29581";
  int size;
  
  
  hash_set test_hashes;
  test_hashes.load_hashes();
  size = test_hashes.number_of_hashes();
  cout << "Size should be 7 and it is " << size << endl;
  test_hashes.print_comments();
  test_hashes.print_hashes();
  test_hashes.add_hash(newhash);
  test_hashes.print_hash_set();
 
 return 0;
}

When you run it on the hash set provided, you should get the following output:


Size should be 7 and it is 7
#This file contains hashes of hacking tools
#Last updated 10/21/2005
#Clay Shields
f4d5d0c0671be202bc241807c243e80b
9a8ad92c50cae39aa2c5604fd0ab6d8c
8eb379c256416aa5c12a72ba39162101
60b725f10c9c85c70d97880dfe8191b3
b0dcfcd427768ddc863f4cd34fb01de9
9ffbf43126e33be52cd2bf7e01d627f9
9e0b5b3061054e88eb8dd0053a00d245
#This file contains hashes of hacking tools
#Last updated 10/21/2005
#Clay Shields
f4d5d0c0671be202bc241807c243e80b
9a8ad92c50cae39aa2c5604fd0ab6d8c
8eb379c256416aa5c12a72ba39162101
60b725f10c9c85c70d97880dfe8191b3
b0dcfcd427768ddc863f4cd34fb01de9
9ffbf43126e33be52cd2bf7e01d627f9
9e0b5b3061054e88eb8dd0053a00d245
3f508486d0b740c8e15a5770e0f29581

Step 2

Now that the hash set object works, we can go on and create an object for the case file. This will contain both the vectors for names and files and all the methods we need to work with them. The outline of the class is:



class case_file{
  
public:
  void load_case_file();
  void print_file_names();
  void print_file_hashes();
  void print_comments();
  void print_case_file();
  int number_of_files();
  void add_file(string, string);
  int find_hash_matches(hash_set);

private:
  vector<string> comments;
  vector<string> names;
  vector<string> hashes;
};

// This method askes for a file name, then opens the file, reads the
// comments from the first three lines and the file names and hashes
// after that. It stores the comment results in order in the comments
// vector, and the hashes in any order in the hashes vector. Any old
// values already in the vectors are overwritten.
void case_file::load_case_file(){
// Fill this in
};

// This method prints the comments from the file neatly
// on the screen
void case_file::print_comments(){
// Fill this in
};

// This method prints the names of the files neatly
// on the screen
void case_file::print_file_names(){
// Fill this in
};

// This method prints the hashes of the files neatly
// on the screen
void case_file::print_file_hashes(){
// Fill this in
};

// This methos prints the entire case file on the screen, 
// comments first, and then 
void case_file::print_case_file(){
// Fill this in    
};

// this method returns the number of files in the case file
int case_file::number_of_files(){
// Fill this in
};

// This method takes a file name and a file hash as parameters
// and then adds the name and hashes to the case vectors
void case_file::add_file(string new_name, string new_hash){
// Fill this in
};

// This method takes a hash_set object as  a parameter. 
// It then finds all case hashes that match, and prints 
// those to the screen. It returns the total number of matches.

int case_file::find_hash_matches(hash_set given_hashes){
// Fill this in
};

Again, once you have filled that in, you can test it with the main below. Again, this is a good way to see how to use the objects.



int main(){
  
  string newfile = "new.txt";
  string newhash = "3f508486d0b740c8e15a5770e0f29581";
  int size, found;

  hash_set test_hashes;
  test_hashes.load_hashes();
  test_hashes.add_hash(newhash);

  case_file test_case;
  test_case.load_case_file();
  test_case.print_comments();
  test_case.print_file_names();
  test_case.print_file_hashes();
  size =  test_case.number_of_files();
  cout << "Size should be 6 and it is " << size << endl;
  test_case.add_file(newfile, newhash);
  test_case.print_case_file();
  found = test_case.find_hash_matches(test_hashes);
  cout << "Found should be 2 and it is " << found << endl;
  
  return 0;
}




Please enter the name of the hash file: hash_set
Please enter the name of the case file : case
# These are hashes of the 
# ~clay/classes/f05/071/projects/p3 directory
# 10/21/2005 Clay Shields
a.out
data_file
hashes
hashes.cc
known_hashes
p3.html
6cf502e4a3b2a92334b43c7ebbd5adec
1938010c6305308c0ca8d8b0d8dc4969
8eb379c256416aa5c12a72ba39162101
256ae342b7a3c47e44f42b86c06ff39c
0972956ce73314bec610b69efc9870f0
28ae7f13bb9c0148dc746be63bfc9b23
Size should be 6 and it is 6
# These are hashes of the 
# ~clay/classes/f05/071/projects/p3 directory
# 10/21/2005 Clay Shields
a.out  6cf502e4a3b2a92334b43c7ebbd5adec
data_file  1938010c6305308c0ca8d8b0d8dc4969
hashes  8eb379c256416aa5c12a72ba39162101
hashes.cc  256ae342b7a3c47e44f42b86c06ff39c
known_hashes  0972956ce73314bec610b69efc9870f0
p3.html  28ae7f13bb9c0148dc746be63bfc9b23
new.txt  3f508486d0b740c8e15a5770e0f29581
File: hashes matches known hashes with hash:
8eb379c256416aa5c12a72ba39162101
File: new.txt matches known hashes with hash:
3f508486d0b740c8e15a5770e0f29581
Found should be 2 and it is 2


Step 3

Now that you have working objects, all you have to do is recreate the main function from Project 3 using the exact same menu structure. Again, you can see my sample program on GUSUN, or look at what options my Project 3 solution uses.

Resources
As with the last projects, you can copy my solution to your gusun account and play with it as needed. To do this, type:

cp ~clay/obj-hashes ./

You can also copy over the small sample case file or the sample hash set by typing the following two commands:

cp ~clay/case ./
cp ~clay/hash_set ./

What to turn in

Include the following header in your source code.

	    //
	    // Project 3
	    // Name: <your name>
	    // E-mail: <your e-mail address>
	    // COSC 071
	    //
	    // In accordance with the class policies and Georgetown's
	    // Honor Code, I certify that I have neither given nor
	    // received any assistance  on this project with the
	    // exceptions of the lecture notes and those  items noted
	    // below.
	    //
	    //
	    // Description: <Describe your program>
	    //
	  

You will submit your source code using the submit program. This is the .cc file. Do not submit the compiled version! I don't speak binary very well.

To submit your program, make sure there is a copy of the source code on your account on gusun. You may name your program what you like - let's assume that it is called hashes.cc. To submit your program electronically, use the submit program like we did in Homework 2 and Project 1, 2, and 3, but with the command:

submit -a p4 -f obj-hashes.cc