On machine learning, ROC analysis, and statistical tests of significance

Marcus A. Maloof

ROC analysis is being used with greater frequency as an evaluation methodology in machine learning and pattern recognition. Researchers have used ANOVA to determine if the results from such analysis are statistically significant. Yet, in the medical decision making community, the prevailing method is LABMRMC. Although this latter method uses ANOVA, before doing so, it applies the Jackknife method to account for case-sample variance. To determine whether these two tests make the same decisions regarding statistical significance, we conducted a Monte Carlo simulation using several problems derived from Gaussian distributions, three machine-learning algorithms, ROC analysis, ANOVA, and LABMRMC. Results suggest that the decisions these tests make are not the same, even for simple problems. Furthermore, the larger issue is that since ANOVA does not account for case-sample variance, one cannot generalize experimental results to the population from which the data were drawn.

Paper available in PostScript (gzipped) and PDF.

@inproceedings{maloof.icpr.02,
  author = "Maloof, M.A.",
  title = "On machine learning, {ROC} analysis, and statistical tests of significance",
  booktitle = "{Proceedings of the Sixteenth International Conference
    on Pattern Recognition}",
  year = 2002,
  pages = "204--207",
  publisher = "IEEE Press",
  address = "Los Alamitos, CA"
}