COSC-575: Machine Learning

Project 4
Fall 2018

Due: W 11/21 @ 11:59 P.M.
10 points

Building upon your previous projects—naturally—implement Flach's primal version of the perceptron as Perceptron, Flach's kernel perceptron with a polynomial kernel KernelPerceptron, and Zurada's multi-layer feed-forward neural network trained with the backpropagation algorithm as BP. For these three methods, terminate training if they have not converged after 50,000 iterations. For getDistribution, use the hardmax function on the network's output; we'll discuss this in class. For Perceptron, use a learning rate of 0.9. For Perceptron andKernelPerceptron, implement Flach's method of probability calibration. Both both use the switch -fc to use calibration.

For BP, initialize the weights to small random values. Make sure BP uses a different random seed each time it is constructed. If train reaches 50,000 iterations, it should throw FailedToConvergeException, which you can derive from RuntimeException. Use the learning rate of 0.9, a minimum error of 0.1, the option -J to specify the number of units in the hidden layer not including the bias unit.

Implement routines to map the examples of a data set into a homogeneous coordinate system. For data sets consisting of only nominal attributes, use either a binary or linear encoding for the attributes. For the class label, use a linear binary or bipolar encoding. The perceptrons should use a bipolar encoding, and the multi-layer neural network should use a binary encoding. You can assume that an example passed to classify( Example ) and getDistribution( Example ) is in a homogeneous coordinate system. You do not need to worry about data sets with both numeric and nominal attributes.

Here are some new data sets:

As we have discussed in class, training neural networks can be computationally expensive, and it may not converge. Develop your implementation by training and testing on the small data sets, such as bikes and xor, until you are confident that everything seems to be working. I recommend using the hold-out method for larger data sets, such as votes and mushroom. I do not recommend using k-fold cross-validation, although if you're using your own laptop, and you want to convert electricity to heat, then go ahead.

If you're using cs-class, please be mindful of other users on the system. If you want to kick off a big training job in the background and go to the Tombs, please be nice and use nice. For example:

cs-class$ nice java BP -t cats-and-dogs.mff < /dev/null >| output &
This command runs BP with a nice priority. The fancy redirects prevent ssh from hanging when you log out and write the output of BP to the file named output. The final ampersand puts the job in to the background, where it will run for a long time. At this point, you can log out and head over the Tombs.

When you reconnect to cs-class, you can check to see if the job is still running by looking for the name of your executable —in this case, java BP—in the list of active processes:

cs-class$ ps -ef | grep BP
maloofm  16205     1 98 15:37 ?        00:00:13 java BP -t cats-and-dogs.mff
maloofm  17920 16238  0 15:45 pts/5    00:00:00 grep BP
You can examine the contents of the output file by typing:
cs-class$ more output
If for some reason your implementation of BP seems like it will never terminate, please do not leave it running. To kill a job, look in the list of active processes for the job's ID:
cs-class$ ps -ef | grep BP
maloofm  16205     1 98 15:37 ?        00:00:13 java BP -t cats-and-dogs.mff
maloofm  17920 16238  0 15:45 pts/5    00:00:00 grep BP
In this case, it is 16205. Use the kill command to kill the process. It should no longer appear in the process list.
cs-class$ kill 16205
cs-class$ ps -ef | grep BP
maloofm  19129 16238  0 15:55 pts/5    00:00:00 grep BP

Instructions for Submission

In a file named HONOR, please include the statement:
In accordance with the class policies and Georgetown's Honor Code,
I certify that, with the exceptions of the class resources and those
items noted below, I have neither given nor received any assistance
on this project.
Include this file in your zip file submit.zip.

Submit p4 exactly like you submitted p3. Make sure you remove all debugging output before submitting.

Plan B

If Autolab is down, upload your zip file to Canvas.

Copyright © 2019 Mark Maloof. All Rights Reserved. This material may not be published, broadcast, rewritten, or redistributed.