COSC 387: Artificial Intelligence

Project 4
Fall 2001

Due: Nov 19 @ 5 P.M.
13 points

Several researchers have used machine learning algorithms to learn behavioral profiles of people, for marketing or for customizing software. Another important use of these profiles is for computer intrusion detection. The basic idea is to learn a profile of each individual user based on their past computer use and then use the profile to verify his or her identify.

The file elmer.session is an example of a UNIX accounting file for a user named `elmer', and you can get more information about the fields from the man page for the acctcom command. (Type `man acctcom' at the UNIX prompt.) I'll give the details of how I went from the audit trail to the data set in lecture, but you can also find this information in a technical report. The section of interest is probably 4.1 on page 15.

Tasks:

  1. Implement the naive Bayes and k-nearest neighbor learning algorithms. For the learning elements, implement the functions nb-train and knn-train. For the performance elements, implement the functions nb-classify and knn-classify. You may use Lisp, C, C++, or Java, provided that your program compiles and executes under UNIX. If you use a language other than Lisp, attach your data files when you submit.

  2. Conduct an experiment to determine the best performing method using the data set in the file sec.data.lisp.

  3. To perform the experiment, conduct 10 runs. For each run, randomly divide the examples into a training set (60%) and a testing set (40%). Run naive Bayes and k-NN, for k = 3, 5, 7, 9, and compute the time spent learning, the time spent testing, and the percent correct on the examples in the testing set.

    You can time the execution of a Lisp function using the time function. For example, if you have a function called classify and the list of testing examples bound to the atom testing, then to time the classify function you simply type: (time (classify testing)).

  4. For each of the performance metrics (i.e., percent correct, learning time, and testing time) and each algorithm, compute the average over the ten runs to determine the best performing method.

  5. For the best performing method, train a classifier using all of the training data (i.e., sec.data.lisp), and determine if any of the sessions in the file sec.unknown.lisp is likely to be that of an intruder.

  6. In the header comments of your source file, submit all of your analysis: the averages of the performance metrics, and your prediction for each of the ten unknown sessions.

Instructions for Submission: In the header comments, provide the following information:

;;;;
;;;; Name
;;;; E-mail Address
;;;; Platform: Windows, Linux, Solaris (cssun)
;;;; Lisp Environment: gcl, clisp, cmucl
;;;; Mail Client: mailx, pine, GUMail, Netscape, Yahoo!, etc.
;;;;
;;;; In accordance with the class policies and Georgetown's Honor Code,
;;;; I certify that, with the exceptions of the lecture notes and those
;;;; items noted below, I have neither given nor received any assistance
;;;; on this project.
;;;;
When you are ready to submit your program for grading, e-mail it to me as one file with your net ID and the suffix ``.lisp'' as the subject line.

For example, if you were to submit using mailx on cssun, and if your net ID is ab123 and the name of your source file is proj1.lisp, then type at the UNIX prompt:

cssun% mailx -s "ab123.lisp" maloof@cs < proj1.lisp
If you use some other mail client, then follow the same instructions, and send your code as an attachment. Submit your project before 5:00 P.M. on the due date.

Once submitted, it is important to keep an electronic copy of your project on either cssun or gusun. If we lose your project or the e-mail system breaks, then we will need to look at the modification date and time of your project to ensure that you submitted it before it was due. If you developed your code on a Windows machine, then use a secure ftp client to transfer your file to cssun or gusun.

Finally, when storing source code on university machines, it is important to set file permissions so others cannot read the file. To turn off such read/write permissions, type at the UNIX prompt chmod og-rw <file>, where <file> is the name of your source file.