COSC 387: Artificial Intelligence

Project 4
Fall 2000

Due: Nov 22 @ 5 P.M.
13 points

Several researchers have used machine learning algorithms to learn behavioral profiles of people, for marketing or for customizing software. Another important use of these profiles is for computer intrusion detection. The basic idea is to learn a profile of each individual user based on their past computer use and then use the profile to verify his or her identify.

The file elmer.session is an example of a UNIX accounting file for a user named `elmer', and you can get more information about the fields from the man page for the acctcom command. (Type `man acctcom' at the UNIX prompt.) I'll give the details of how I went from the audit trail to the data set in lecture, but you can also find this information in a technical report. The section of interest is probably 4.1 on page 15.

Tasks:

  1. Implement the naive Bayes and k-nearest neighbor learning algorithms.

  2. Conduct an experiment to determine the best performing method using the data set in the file sec.data.lisp.

  3. To perform the experiment, conduct 10 runs. For each run, randomly divide the examples into a training set (60%) and a testing set (40%). Run naive Bayes and k-NN, for k = 3, 5, 7, 9, and compute the time spent learning, the time spent testing, and the percent correct on the examples in the testing set.

    You can time the execution of a Lisp function using the time function. For example, if you have a function called classify and the list of testing examples bound to the atom testing, then to time the classify function you simply type: (time (classify testing)).

  4. For each of the performance metrics (i.e., percent correct, learning time, and testing time) and each algorithm, compute the average over the ten runs to determine the best performing method.

  5. For the best performing method, train a classifier using all of the training data (i.e., sec.data.lisp), and determine if any of the sessions in the file sec.unknown.lisp is likely to be that of an intruder.

  6. In the header comments of your source file, submit all of your analysis: the averages of the performance metrics, and your prediction for each of the ten unknown sessions.

Instructions for Submission: Although you can use any Common Lisp environment, all programs must run under GNU gcl. When you are ready to submit your program for grading, e-mail it as one file to the TA using the last four digits of your student ID and the suffix ``.lisp'' as the subject line.

For example, if the last four digits of your student ID is 1234, the name of your source file is proj1.lisp, and your TA's e-mail address is ``imagoodtamaloof@cs'', then you would type at the UNIX prompt:

gusun% mailx -s "1234.lisp" imagoodtamaloof@cs < proj1.lisp
You must e-mail your project before 5:00 P.M. on the due date.

Once you submit your project, it is important to keep an electronic copy that preserves the modification date and time. If we lose your project or the e-mail system breaks, then we will need to look at the modification date and time of your project to ensure that you submitted it before it was due.

Finally, when storing source code on university machines, it is important to set file permissions so others cannot read the file. To turn off such read/write permissions type at the UNIX prompt chmod og-rw <file>, where <file> is the name of your source file.