COSC 388: Machine Learning

Project 2
Fall 2006

Due: Oct 13 @ 5 P.M.
7 points

In this project, you are to implement incremental versions of k-NN and naive Bayes, and to conduct a proper evaluation of these two methods using the Pima Indians Diabetes data set.

Tasks:

  1. Building upon your implementation for project 1, implement incremental versions of k-NN and naive Bayes. The implementations should be general in the sense that they should work for all data sets with symbolic class labels. Our convention is that the last attribute of the attribute declarations is the class label.

  2. Conduct a proper evaluation to determine the best performing method for the Pima Indians Diabetes data set. Information about this data set is in the file pima.info. To evaluate the learning method, you will use ten-fold cross-validation. I have already split the data set into 10 training and testing sets and placed them in the archive pima.zip.

    1. Select performance metrics.

    2. Select a measure of variability.

    3. Implement routines to conduct the experiment, that is, to properly manipulate the data file and to compute the necessary performance metrics and statistics. Note that each algorithm will be applied to the same training and testing sets.

    4. Conduct the experiment based on your design. Run naive Bayes and k-NN, for k = 1, 3, 5, and 7.

  3. In a text file, describe your experimental design and present and analyze the results.

Instructions for Submission: In the header comments, provide the following information:

//
// Name
// E-mail Address
// Platform: Windows, OS X, Linux (seva), Solaris (gusun, daruma), etc.
// Language/Environment: gcc, g++, java, g77, etc.
//
// In accordance with the class policies and Georgetown's Honor Code,
// I certify that, with the exceptions of the class resources and those
// items noted below, I have neither given nor received any assistance
// on this project.
/
When you are ready to submit your program for grading, create a compressed archive of a directory containing only your project's source, and send it to me by e-mail as an attachment. The directory's name should be the same as your net ID.

For example, assume your net ID is ab123. If the directory p1 contains your project, then rename the directory to ab123.

To make the archive smaller, remove any object files, such as .class, a.out, and .o files.

Use zip, tar, or jar to create an archive:

% zip ab123.tar ab123/*
% tar -cf ab123.tar ab123
% jar -cf ab123.jar ab123
Use jar only for Java projects. If you use jar or tar, then compress the archive by typing
% gzip ab123.tar
% gzip ab123.jar
which creates a file ab123.tar.gz and ab123.jar.gz, respectively.

N.B. If you use zip, then you need to change the extension of your file to something other than .zip, as UIS strips .zip attachments. The extension .piz works pretty well. So you'd rename ab123.zip to ab123.piz.

Attach the file containing your project to an e-mail and send it to me.

Make sure you send a carbon copy of your project to yourself, so you'll have a record of when you submitted your project. Ideally, also keep a copy on a university or department machine. However, make sure that your archive, directory, or files are not readable by others.

Submit your project before 5:00 P.M. on the due date.