Fall 2005 |
Clay Shields |
Project 2 - Index.dat Investigator
Assigned: Oct. 3 |
Index.dat InvestigatorVery often, computer forensic investigations are intended to answer some question about what a particular user has been doing on the system, and the forensic investigator gathers information to support or refute some hypothesis about the user's behavior. Forensic investigators can look examine many files to find evidence. One common technique is to examine the user's internet history to determine what web sites were looked at, and when they were visited. For a longer description of what can be found, take a look at this article. For this project, we are going to write our own internet history parsing tool. The files we will be reading will consist of lines in the form: <type> <time> <date> <domain> <file> <cache> where:
<type> will always be the string "URL"It is important to note that the computer adds to the bottom of the file as the user visits web pages, so that the times and dates will always be sorted in descending order. Also, since the computer may not be used every day, the dates may not be continuous.
Your assignmentYou are to write a program that will ask for the name of an index file to read and for a particular web domain that you are interested in. The program will calculate the total number of different days in the index file, the total number of visits to that web domain, and the percentage of days in the file that the specified domain was visited. This will help investigators understand the user's behavior and if the user was a frequent visitor to a particular site and how many files they looked at there. Below is the output of a program that does this:gusun% ./indexYou can run the version of the program that does this by logging onto gusun and typing ~clay/index. There are also some sample data files that you can use by copying them over to your directory. To do so, type: cp ~clay/*.dat ~/ A short file is named small.dat is available for you to look at; a longer one will be made available after the design documents are in. UPDATEI have added another file you can test your program with. You can use it copying them over to your directory. To do so, type:cp ~clay/long.dat ~/ and it will appear in your directory as long.dat. Or you can click here for it. Part 1 - Design DocumentFor the first part you are to submit a design document showing the algorithm you plan to implement.DO NOT SUBMIT A FLOWCHART. I won't grade it. Instead, write it out neatly using a language which is similar to that from Homework 1 and has the following terms:
statement ... statement end A copy of your algorithm is due in class. Be sure to keep a copy for yourself!
Part 2 - Program Source CodeImportant: Your output and input should be very similar to that shown in the example program. Please ask for the input in exactly the same order shown and only request the same items shown - do not ask for any other input. This will assist in grading your program.Include the following header in your source code. // // Project 1 // Name: <your name> // E-mail: <your e-mail address> // COSC 071 // // In accordance with the class policies and Georgetown's Honor Code, // I certify that I have neither given nor received any assistance // on this project with the exceptions of the lecture notes and those // items noted below. // // Description: <Describe your program> //
You will submit your To submit your program, make sure there is a copy of the source code on your account on gusun. You may name your program what you like - let's assume that it is called index.cc. To submit your program electronically, use the submit program like we did in Homework 2 and Project 1, but with the command: submit -a p2 -f index.cc I will not be enabling the electronic submission until after the design documents are in, so don't try to submit too early. Bonus challenge sectionYou don't have to do this, but if you want a challenge, you can try to modify your program to count all visits to a particular domain. For example, a visitor might go to any of the following domains:
explore.georgetown.edu It might be useful to us to be able to find out all visits to any georgetown.edu site all at once. Try to modify your program so that if the user enters georgetown.edu as the domain, it finds any site that ends in georgetown.edu, regardless of what it starts with. If you do this, be sure to note that you did so in your header comments. |