COSC 388: Machine Learning
Project Guidelines
This research project is designed to let you concentrate on a problem
of interest that is related to machine learning. It consists of two
assignments, one of which is graded:
the prospectus,
the final paper.
When submitting these documents, hard copies are fine. If you
want to submit electronically, then I will only accept documents
in Adobe's
Portable Document Format (PDF).
You are certainly not limited to the following, so be creative.
However, you must do something computational.
A list of the types of projects that would be acceptable includes:
Implementations
If you have solid programming skills, then you could attempt to implement
a learning method, such as a decision tree or a Bayesian network.
You would need to apply it
to a few simple problems to demonstrate that it works. For example,
one student implemented the AQ learning algorithm in Java and applied
it to the mountain bike and iris data sets. You could also extend
one of the programming projects.
Experimental Studies
You could also take advantage of the many implementations that exist
and conduct an empirical study of how they perform for a set of problems.
For example, you could determine which machine learning technique
best predicts how representatives in Congress vote. One student found
a data set from a competition for predicting electicity demand and
used implementations of various machine-learning algorithms in an
attempt to best the winner of the competition. Plus, predicting
demand for electicity is kind of important. Just ask any Californian.
Still Stuck?
If you're still having difficulty converging on a topic, come see
me. I'll be happy to hand you a project. I have a lot of data,
a lot of code, and a lot of unanswered questions. My interests
lie in the area of machine learning, so if you're intrigued by
that, here are some possibilities (aka shameless plugs):
- I'm interested in concept drift, which are concepts that
change over time. There are a few on-line learning algorithms for
which it'd be interested to see how well they perform on a set
of drifting concepts.
- I have some audit trail data for a computer intrusion detection
task, and I have some ideas about an experiment that would compare
two prevailing approaches.
- There is an unsupervised learning method called ``hypergraph
clustering.'' Learn about it, implement it, and compare it to
existing approaches, like k-means, using some, say, gene data.