COSC 288: Introduction to Machine Learning

Project 2
Spring 2016

Due: Mon, Feb 29 @ 11:59 P.M.
10 points

Building upon your implementation for p1, implement k-NN and naive Bayes for nominal attributes. Also implement routines for k-fold cross-validation.

The implementations should be general in the sense that they should work for all data sets with nominal attributes and nominal class labels. Our convention is that the last attribute of the attribute declarations is the class label.

Tasks:

  1. Design and implement a (short) class hierarchy for classifiers and the learning methods for this project. We'll discuss your thoughts about possible designs in lecture. Based on the discussion thus far, good designs must implement

  2. Implement k-NN. Your implementation does not have to use a fixed-capacity min-heap, but the data structure you use to determine the k-closest neighbors must use O(k) space. The implementation of k-NN should include the command-line switch -k to specify the value of k. Use 3 as the default.

  3. Implement naive Bayes, using add-one smoothing for nominal attributes.

  4. Implement an Evaluator class that evaluates the performance of any Classifier using k-fold cross-validation. Use the option -x to specify the number of folds, with 10 being the default.

    Implement the learners as two separate executables: NaiveBayes and kNN. No windows. No menus. No prompts. Just do it.

    The logic of each implementation should be as follows. The user must provide a training set (using the -t switch). The program should evaluate the method on the training set using 10-fold cross-validation and output the results. Naturally, the user can use the -x switch to change the default. The output should consist only of the average accuracy and some measure of dispersion, such as variance, standard deviation, standard error, or a 95% confidence interval.

  5. Evaluate the algorithms and your implementations using 10-fold cross-validation. Use the mushroom data set: mushroom.mff, and the votes data set: votes.mff. Run naive Bayes (obviously) and k-NN, for k = 1, 3, 5, and 7. Place the output of these runs in a text file and include it with your submission. On a Unix machine, you can record program output using the script command, or you can direct the output to a file.

Instructions for Submission

In the header comments in at least the main file of your project, provide the following information:
//
// Name
// E-mail Address
// Platform: Windows, MacOS, Linux, Solaris, etc.
// Language/Environment: gcc, g++, java, g77, ruby, python.
//
// In accordance with the class policies and Georgetown's Honor Code,
// I certify that, with the exceptions of the class resources and those
// items noted below, I have neither given nor received any assistance
// on this project.
//

When you are ready to submit your program for grading, create a zip file of the directory containing only your project's source and build instructions, and upload it to Blackboard.

Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.