This research project is designed to let you concentrate on a problem of interest that is related to machine learning. It consists of three parts, two of which are graded: the prospectus, the presentation, and the paper.

When submitting these documents, hard copies are fine. If you submit electronically, the documents must be in Adobe's Portable Document Format (PDF).

- an application
- a theoretical study
- an experimental study

Since I am an experimentalist, I am in the best position to evaluate and help with experimental studies. Developing a new learning method might be a bit too ambitious for a first or semester-long class in machine learning, so I anticipate that most students will choose to extend an existing study or conduct an evaluating to characterize some aspect of existing methods.

An experiment is a systematic procedure designed to test a research hypothesis. Let us say an experiment consists of six elements: an experimental condition, a control condition, a set of independent variables, a set of dependent variables, and a set of measures for each dependent variable. The experimental procedure consists of systematically varying the independent variables and measuring the dependent variables for the experimental and control conditions to determine if the outcome of the experiment supports or refutes the research hypothesis.

As an example, assume that I hypothesize that the gain ratio is the best method for attribute selection for tree induction. To design an experiment to test this hypothesis, I select as the dependent variable the data sets on which I will run a tree-induction algorithm. For example, I could select a large number of data sets from the UCI Repository.

I select as the independent variable the performance of the induced trees on testing data. I decide to measure performance using accuracy. As a control condition, I use the same tree induction algorithm, but use other methods for attribute selection, such as the Gini coefficient and the misclassification rate.

To conduct this experiment, my experimental method consists of using ten-fold cross-validation to evaluate the decision-tree algorithms over all of the data sets. If the accuracy for all runs of the algorithm with infromation gain is higher than that for all of the runs of the algorithms with other attribute selection methods, then this outcome supports my hypothesis. Otherwise, the outcome refutes my hypothesis.

It is unlikely that *all* runs will be superior.
What do we conclude if 60% of the runs support our hypothesis?
What about 50%?
For any percentage, are the observed differences statistically
significant?

Some experiments include a randomized control. For example, we might implement a decision tree algorithm that randomly selects attributes at each level.

Some experiments examine confounding variables. For example, does one measure affect the pruning algorithm in a way that another measure does not. How do we measure or control for this confounding variable?

Submit a report that details your research hypothesis, experimental design, analysis, and conclusions. The report must include a bibliography with complete entries. The report must be in PDF. It must not exceed five pages, have margins less than one inch, and a font size smaller than ten points. Along with the report, submit any code that you write, but you do not have to submit data sets, unless they are of your design. Upload everything as a zip file to Blackboard. If it is late, I'll deduct 1% for every minute.

To assess your project, I will consider three aspects of your study. I will consider the degree to which someone skilled in the art of machine learning could reproduce your study and its results based on the written report. I will consider the soundness of the experimental methodology that you used to conduct the study. I will evaluate the completeness of the experimental study that you performed. For example, a study consisting of two algorithms and one data set probably will not pass muster.

- Measure naive Bayes' sensitivity to irrelevant attributes
- Measure the effect of the level of pruning with C4.5 (or J48)
- Measure naive Bayes' sensitivity to violations of the assumption of conditional independence among attributes
- Measure the degree to which a bagging is superior to a single-model method
- Measure the degree to which a boosting is superior to a single-model method
- Measure the degree to which a random forest is superior to a single decision tree
- For a version of naive Bayes that handles numeric attributes by assuming they are normally distributed, measure naive Bayes' sensitivity to violations of this assumption.
- Reproduce and extend John and Langley's study of naive Bayes and flexible Bayes
- With naive Bayes, is it better to assume numeric attributes are normally distributed or map their values into discrete intervals?
- Measure differences in performance between Ripper and OneR
- Measure differences in performance between C4.5 and decision stumps
- Characterize the performance of different methods for handling unobserved values in training data for naive Bayes