Project 1
Fall 2022
Due: F 9/16 @ 5:00 PM
9 points
Make | Tires | Handle Bars | Water Bottle | Weight | Bike Type |
Trek | Knobby | Straight | y | 250.3 | Mountain |
Bridgestone | Treads | Straight | y | 200 | Hybrid |
Cannondale | Knobby | Curved | n | 222.9 | Mountain |
Nishiki | Treads | Curved | y | 190.3 | Hybrid |
Trek | Treads | Straight | y | 196.8 | Hybrid |
The file bikes.mff shows this data set in Mark's File Format. Files containing valid data sets begin with '@dataset' followed by an identifier. Attribute declarations appear next. The string '@attribute' precedes each declaration, which is a symbolic attribute or a numeric attribute. The attribute's name appears next, followed by its domain. The domain for symbolic attributes is a list of values separated by whitespace. The domain for numeric attributes is not explicitly specified and is assumed to be the set of representable integer and floating-point numbers.
The token '@examples' separates the attribute declarations from the examples, which are simply values separated by whitespace.
The following grammar specifies the syntax for data sets and their elements:
For simplicity, you can assume that all elements of the file are separated by at least one space character. Moreover, attribute declarations and examples will appear on single lines. Note that although these elements appear on a single line, I encourage you to parse the tokens of the file as a stream. The only exception is the parsing of the domain of nominal attributes, since we do not know the size of the domain and Scanner provides no method for detecting the end of a line, at least as far as I know. It is, however, possible to parse this file without this exception.dataset ::= header attributes examples header ::= '@dataset' identifier attributes ::= attribute attribute attribute-list attribute-list ::= attribute attribute-list | ε attribute ::= '@attribute' identifier declaration declaration ::= 'numeric' | nominal-values examples ::= '@examples' example-list example-list ::= example examples-list | ε example ::= attribute-value example | ε attribute-value ::= identifier | number nominal-values ::= identifier identifier identifier-list identifier-list ::= identifier identifier-list | ε identifier ::= non-whitespace-character non-whitespace-characters non-whitespace-characters ::= non-whitespace-character non-whitespace-characters | ε non-whitespace-character ::= 'a'..'z' | 'A'..'Z' | '0'..'9', '_'..'+' number ::= -∞..+∞
As for the design, I have produced documentation for the classes and their methods. You must follow this design. To save you some keystrokes, I have put the source files with method signatures and several data sets in a zip file on “cs-class” (i.e., class-1.cs.georgetown.edu), which you can retrieve using the following command:
cs-class-1% cd cs-class-1% cp ~maloofm/cosc288/p1.zip ./ cs-class-1% unzip p1.zip
% java p1 -t bikes.mff % java p1 -t bikes.train % java p1 -t bikes.train -T bikes.test % java p1 -t bikes-tr.mff -T bikes-te.mffFor this project, the program can simply read the input files, populate the appropriate objects, and output the data set or sets to the console in the same format they appear in the file. The implementation must be general, meaning that it should work for all possible data sets based on the grammar.
The two most challenging aspects of this project are the passing of command-line arguments to their handlers and the passing of the Scanner object to the correct parse method. To handle command-line arguments (or options), simply pass the array of strings to the object's setOptions method. It should perform a linear search of the array for the options it handles. If setOptions finds such an option, then it should take the appropriate action, such as calling a method or setting a flag or value.
Similarly, you need to pass a Scanner object to the appropriate parse methods, which will extract its tokens, populate the appropriate fields, and either return or pass the Scanner object into some other parse method.
One approach could be to develop some of the low-level parsing routines (e.g., AttributeFactory) in isolation using strings before attempting to integrate them for parsing the entire file. Note that you can construct a Scanner with a String and a File. Another approach is to develop the parsing routines as the tokens and elements appear in the file. This requires at times parsing high-level elements (e.g., DataSet) and then parsing lower-level elements (i.e., Attributes and then AttributeFactory). You should take an approach that makes the most sense to you.
You can use any platform for development, but your project must compile on the command line using the following command within the directory containing your source:
$ javac *.javaThis is the command that Autolab will use to build your project. Do not use packages.
The best way to test for this is to zip up your source, transfer it to the class-1 server (or to another directory on your laptop), and compile and run your program and unit tests from the command line.
In accordance with the class policies and Georgetown's Honor System, I certify that, with the exceptions of the class resources and those items noted below, I have neither given nor received any assistance on this project. Name NetIDInclude this file with your submission. When you are ready to upload to Autolab, create the zip file for submission by typing:
$ zip submit.zip *.java HONORYour source files must not be in a directory.
You can submit to Autolab seven times. You can perform five compile checks, to make sure your code compiles against the autograder, and you can perform two project submissions that are automatically graded. The last automatically graded submission is the grade for the project. Make sure you remove all debugging output before submitting.
Submit via Autolab.
Copyright © 2022 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.