COSC 388: Project Guidelines

COSC 388: Machine Learning

Project Guidelines

This research project is designed to let you concentrate on a problem of interest that is related to machine learning. It consists of two assignments, one of which is graded: the prospectus, the final paper.

When submitting these documents, hard copies are fine. If you want to submit electronically, then I will only accept documents in Adobe's Portable Document Format (PDF).

You are certainly not limited to the following, so be creative. However, you must do something computational. A list of the types of projects that would be acceptable includes:

Implementations

If you have solid programming skills, then you could attempt to implement a learning method, such as a decision tree or a Bayesian network. You would need to apply it to a few simple problems to demonstrate that it works. For example, one student implemented the AQ learning algorithm in Java and applied it to the mountain bike and iris data sets. You could also extend one of the programming projects.

Experimental Studies

You could also take advantage of the many implementations that exist and conduct an empirical study of how they perform for a set of problems. For example, you could determine which machine learning technique best predicts how representatives in Congress vote. One student found a data set from a competition for predicting electicity demand and used implementations of various machine-learning algorithms in an attempt to best the winner of the competition. Plus, predicting demand for electicity is kind of important. Just ask any Californian.

Still Stuck?

If you're still having difficulty converging on a topic, come see me. I'll be happy to hand you a project. I have a lot of data, a lot of code, and a lot of unanswered questions. My interests lie in the area of machine learning, so if you're intrigued by that, here are some possibilities (aka shameless plugs):

I'm interested in concept drift, which are concepts that change over time. There are a few on-line learning algorithms for which it'd be interested to see how well they perform on a set of drifting concepts.
I have some audit trail data for a computer intrusion detection task, and I have some ideas about an experiment that would compare two prevailing approaches.
There is an unsupervised learning method called ``hypergraph clustering.'' Learn about it, implement it, and compare it to existing approaches, like k-means, using some, say, gene data.