Project 4
Fall 2019
Due: F 11/22 @ 5 P.M.
10 points
Implement IREPc as discussed in Sections 2.2 and 2.3 of Cohen's paper entitled Fast Effective Rule Induction.
We will talk about rule-selection heuristics in lecture. To make the project tractable, your implementation does not need to handle numeric attributes, it does not need to prune, and it needs to work only with the 1984 Congressional Voting Record. See also the names file. I created a version of this data set in a simplified format: votes-comments.dta. I also put this file on cs-class, which you can retrieve using the command:
cs-class% cp ~maloofm/cosc270/votes-comments.dta ./Feel free to remove the comments and read the data set into your program, or you can hard-code the data set into your program. You can remove comments and reorganize the data, but you cannot modify the data in any other way.
To evaluate the learned rules, implement the hold-out method, which involves selecting a random set of the original examples as a training set and using the remaining examples as a testing set. Use 75% of the original examples as the training set. The training set serves as input to IREP. Once the program produces a set of rules, then it evaluates the rules on the examples of the test set. Seed the random number generator with the system clock so you will get a different selection of examples and a different set of rules each time you run IREP.
IREP.main should produce five sets of rules using different training and testing sets using the hold-out method. For each rule set produced, main should print the rules and the accuracy of those rules on the testing set for both classes (i.e., the overall accuracy) and for each class (i.e., the true-positive and true-negative rates). It should then print the average accuracy of the five rule sets as the JSON string {"Average Accuracy": <accuracy>}, where <accuracy> is a floating-point number.
Name NetID In accordance with the class policies and Georgetown's Honor Code, I certify that, with the exceptions of the course materials and those items noted below, I have neither given nor received any assistance on this project.
When you are ready to submit your project for grading, put your source files, Makefile, and honor statement in a zip file named submit.zip. Upload the zip file to Autolab using the assignment p4. Make sure you remove all debugging output before submitting.
Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.