Project 1
Spring 2016
Due: Thu, Feb 11 @ 11:59 PM
10 points
Make | Tires | Handle Bars | Water Bottle | Weight | Bike Type |
Trek | Knobby | Straight | y | 250.3 | Mountain |
Bridgestone | Treads | Straight | y | 200 | Hybrid |
Cannondale | Knobby | Curved | n | 222.9 | Mountain |
Nishiki | Treads | Curved | y | 190.3 | Hybrid |
Trek | Treads | Straight | y | 196.8 | Hybrid |
The file bikes.mff shows this data set in Mark's File Format. Files containing valid data sets begin with '@dataset' followed by an identifier. Attribute declarations appear next. The string '@attribute' precedes each declaration, which is a symbolic attribute or a numeric attribute. The attribute's name appears next, followed by its domain. The domain for symbolic attributes is a list of values separated by whitespace. The domain for numeric attributes is not explicitly specified and is assumed to be the set of representable integer and floating-point numbers.
The token '@examples' separates the attribute declarations from the examples, which are simply values separated by whitespace.
For simplicity, you can assume that all elements of the file are separated by at least one space character. Moreover, attribute declarations and examples will appear on single lines. I have defined a grammar for the file format.
As for the design, I have produced documentation for the classes and their methods for C++ and for Java. With the exception of exceptions, you must follow this design. Feel free to propose refinements as part of class discussion. As for exceptions, we will not evaluate your program for error handling, and so you need not implement exception handling at all. On the other hand, you might find implementing exception handling helps you track down bugs during development. Note that in subsequent projects, you will need to make deep copies of training and testing sets.
% a.out -t bikes.mff % a.out -t bikes.train % a.out -t bikes.train -T bikes.test % a.out -t bikes-tr.mff -T bikes-te.mffThe program can simply read the input files, populate the appropriate objects, and output the data set or sets to the console in the same format they appear in the input files. The program need not perform checks for proper formatting and for data integrity.
For this and subsequent projects, you must implement your program using the standard libraries the language provides, meaning that you cannot use a library of numeric, statistical, or machine-learning routines to complete the project. If you want to use external libraries or something non-standard, check with me first.
Finally, this and subsequent projects, it must be easy for me to build and run your project. Make sure you include instructions in a README file that describes exactly how to build and run your project. If you use Java, do not use packages. If you use C++, provide a Makefile. Ultimately, I should be able to drop in your project's directory, compile it, if necessary, and execute it.
// // Name // E-mail Address // Platform: Windows, OS X, Linux, etc. // Language/Environment: gcc, g++, java, ruby, python // // In accordance with the class policies and Georgetown's Honor Code, // I certify that, with the exceptions of the class resources and those // items noted below, I have neither given nor received any assistance // on this project. //When you are ready to submit your program for grading, create a zip file of the directory containing only your project's source and build instructions, and upload it to Blackboard.