COSC 388: Machine Learning

Project 2
Fall 2003

Due: Oct 10 @ 5 P.M.
10 points

In this project, you are to implement versions of k-NN and naive Bayes and to conduct a proper evaluation of these two methods using the Pima Indians Diabetes data set.

Tasks:

  1. Building upon your implementation for project 1, implement k-NN and naive Bayes. These implementations should be based on a proper object-oriented or procedural design. They should be general in the sense that they should work for all data sets with numeric attributes and symbolic class labels.

  2. Conduct a proper evaluation to determine the best performing method for the Pima Indians Diabetes data set. Information about this data set is in the file pima.names. The examples are in the file pima.data.

    1. Select a train/test method.

    2. Select performance metrics.

    3. Select a measure of variability.

    4. Select a statistical test.

    5. Implement routines to conduct the experiment, that is, to properly manipulate the data file and to compute the necessary performance metrics and statistics. Note that each algorithm should be applied to the same training and testing sets.

    6. Conduct the experiment based on your design. Run naive Bayes and k-NN, for k = 1, 3, 5, and 7.

  3. In one page or less, describe your experimental design and present and analyze the results. This document should be in PDF format. You can use the Tom Conversion Server to convert Word documents to PDF, but I will hold you in the highest esteem if you use LaTeX.

Instructions for Submission: In the header comments, provide the following information:

//
// Name
// E-mail Address
// Platform: Windows, OS X, Redhat, Solaris (cssun/gusun/daruma), etc.
// Development Environment: gcc, g++, java, g77, etc.
// Mail Client: mailx, pine, GUMail, Netscape, Yahoo!, etc.
//
// In accordance with the class policies and Georgetown's Honor Code,
// I certify that, with the exceptions of the lecture notes and those
// items noted below, I have neither given nor received any assistance
// on this project.
//
When you are ready to submit your project for grading, create a compressed archive of a directory containing the files of project and send it to me by e-mail as an attachment, as you did for Project 1.

Submit your project before 5:00 PM on the due date.

Once submitted, it is important to keep an electronic copy of your project on either cssun or gusun. These systems are regularly backed-up, and if we lose your project or the e-mail system breaks, then we will need to look at the modification date and time of your project to ensure that you submitted it before it was due. If you developed your code on a Windows machine, then use a secure ftp client to transfer your files or the archive to cssun or gusun.

Finally, when storing source code on university machines, it is important to set file permissions so others cannot read the file. To turn off such read/write permissions, type at the UNIX prompt chmod og-rw <file>, where <file> is the name of your source file.