COSC 288: Introduction to Machine Learning

Project 3
Spring 2009

Due: Fri, Mar 20 @ 10 P.M.
13 points

Implement ID3 for symbolic attributes. Use information gain to select attributes. Use the votes and mushroom data sets to evaluate your implementation. The implementation must be general.

Implement the learner as a single executable. No windows. No menus. No prompts. Just do it.

The logic of the implementation should be the same as that for the implementations of p2. If the user runs a learner and specifies only a training set, then the program should evaluate using 10-fold cross-validation and output the results. Naturally, the user can use the -x switch to change the default. Otherwise, if the user specifies both a training and testing set, then the program should build a model from the training set, evaluate it on the testing set, and output the results.

Your object-oriented design should be something that only a software engineer would love, appreciate, and cherish.

  • Evaluate the algorithm and your implementations using 10-fold cross-validation. Use the mushroom data set: mushroom.mff, and the votes data set: votes.mff. Run ID3 and place the output of these runs in a text file and include it with your submission. On a Unix machine, you can record program output using the script command, or you can direct the output to a file.

    Instructions for Submission

    In the header comments in at least the main file of your project, provide the following information:
    //
    // Name
    // E-mail Address
    // Platform: Windows, MacOS, Linux, Solaris, etc.
    // Language/Environment: gcc, g++, java, g77, ruby, python, haskell, etc.
    //
    // In accordance with the class policies and Georgetown's Honor Code,
    // I certify that, with the exceptions of the class resources and those
    // items noted below, I have neither given nor received any assistance
    // on this project.
    //
    
    Make sure I have clear instructions on how to run your executables. If you're using C or C++, then provide a Makefile.

    Submit via Blackboard. When you are ready to submit your program for grading, create a compressed archive of a directory containing only your project's source, and upload it to Blackboard. The directory's name should be the same as your net ID. If you need to include a note with your submission, put the note in a README file in the directory.

    For example, assume your net ID is ab123. If the directory p3 contains your project, then rename the directory to ab123.

    To make the archive smaller, remove any object files, such as .class, a.out, and .o files.

    Use zip, tar, or jar to create an archive:

    % zip -r ab123.zip ab123/*
    % tar -cf ab123.tar ab123
    % jar -cf ab123.jar ab123
    
    Use jar only for Java projects. If you use jar or tar, then compress the archive by typing
    % gzip ab123.tar
    % gzip ab123.jar
    
    which creates a file ab123.tar.gz and ab123.jar.gz, respectively.

    Upload the compressed archive to Blackboard.

    Submit your project before 10:00 P.M. on the due date.