Project 5
Fall 2017
Due: Thu, Dec 7 @ 11:59 P.M.
13 points
Implement IREPc as discussed in Sections 2.2 and 2.3 of Cohen's paper entitled Fast Effective Rule Induction. You must use C++. You can use only the standard libraries included with the programming language. You must also provide a Makefile that produces the executable a.out. The Makefile must include a clean target that removes the executable and any .o files so the autograder can produce the executable from the source code.
We will talk about rule-selection heuristics in lecture. To make the project tractable, your implementation does not need to handle numeric attributes, it does not need to prune, and it needs to work only with the 1984 Congressional Voting Record. See also the names file. I created a version of this data set in a simplified format: votes-comments.dta. I also put this file on cs-class, which you can retrieve using the command:
cs-class% cp ~maloofm/cosc270/votes-comments.dta ./Feel free to remove the comments and read the data set into your program, or you can hard-code the data set into your program. You can remove comments and reorganize the data, but you cannot modify the data in any other way.
To evaluate the learned rules, implement the hold-out method, which involves selecting a random set of the original examples as a training set and using the remaining examples as a testing set. Use 75% of the original examples as the training set. The training set serves as input to IREP. Once the program produces a set of rules, then it evaluates the rules on the examples of the test set. Seed the random number generator with the system clock.
main should produce five sets of rules using different training and testing sets using the hold-out method. For each rule set produced, main should print the rules and the accuracy of those rules on the testing set for both classes (i.e., the overall accuracy) and for each class (i.e., the true-positive and true-negative rates). It should then print the average accuracy of the five rule sets as the JSON string {"Average Accuracy": <accuracy>}, where <accuracy> is a floating-point number.
Name NetID In accordance with the class policies and Georgetown's Honor Code, I certify that, with the exceptions of the course materials and those items noted below, I have neither given nor received any assistance on this project.
When you are ready to submit your project, create the zip file for uploading by typing:
$ zip submit.zip *.cpp *.h Makefile HONORUpload submit.zip to Autolab. You can submit your project p5 fives times.
Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.