COSC-575: Machine Learning

Project 1
Fall 2018

Due: R 9/27 @ 5:00 PM
10 points

  1. Write a program and the classes and methods necessary to read, parse, store, and output examples in Mark's File Format (mff). Consider the following version of the Bikes data set:

    Make Tires Handle Bars Water Bottle Weight Bike Type
    Trek Knobby Straight y 250.3 Mountain
    Bridgestone Treads Straight y 200 Hybrid
    Cannondale Knobby Curved n 222.9 Mountain
    Nishiki Treads Curved y 190.3 Hybrid
    Trek Treads Straight y 196.8 Hybrid

    The file bikes.mff shows this data set in Mark's File Format. Files containing valid data sets begin with '@dataset' followed by an identifier. Attribute declarations appear next. The string '@attribute' precedes each declaration, which is a symbolic attribute or a numeric attribute. The attribute's name appears next, followed by its domain. The domain for symbolic attributes is a list of values separated by whitespace. The domain for numeric attributes is not explicitly specified and is assumed to be the set of representable integer and floating-point numbers.

    The token '@examples' separates the attribute declarations from the examples, which are simply values separated by whitespace.

    For simplicity, you can assume that all elements of the file are separated by at least one space character. Moreover, attribute declarations and examples will appear on single lines. I have defined a grammar for the file format.

    As for the design, I have produced documentation for the classes and their methods. You must follow this design. To save you some keystrokes, I have put the source files with method stubs and several data sets in the zip file p1.zip.

  2. The executable should take input from the command line. Use -t to specify the name of the training file and use -T to specify the name of the testing file, if any. The following are examples of commands:
    % java p1 -t bikes.mff
    % java p1 -t bikes.train
    % java p1 -t bikes.train -T bikes.test
    % java p1 -t bikes-tr.mff -T bikes-te.mff
    
    For this project, the program can simply read the input files, populate the appropriate objects, and output the data set or sets to the console in the same format they appear in the file. The implementation must be general, meaning that it should work for all possible data sets.

  3. Implement the program using Java 8. You must implement your program using Java's standard libraries, meaning that you cannot use a library of machine-learning routines to complete the project. If you want to use additional resources or something non-standard, check with me first.

    You can use any platform for development, but your project must compile on cs-class using the command within the directory containing your source:

    $ javac *.java
    
    Do not use packages.

Instructions for Submission

In a file named HONOR, please include the statement:
In accordance with the class policies and Georgetown's Honor Code,
I certify that, with the exceptions of the class resources and those
items noted below, I have neither given nor received any assistance
on this project.
Include this file with your submission. When you are ready to upload to Autolab, create the zip file for submission by typing:
$ zip submit.zip *.java HONOR
You can submit to Autolab four times. You can perform two compile checks, to make sure your code compiles against the autograder, and you can perform two project submissions that are automatically graded. The last automatically graded submission is the grade for the project.

Submit via Autolab.

Plan B

If Autolab is down, upload your zip file to Canvas.

Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.