Skewed data sets, Receiver Operating Characteristic analysis, and learning to detect rooftops

Marcus A. Maloof

In this talk, I will discuss the use of machine-learning algorithms for improving a computer vision system that detects buildings in overhead images. Our studies concentrated on rooftop detection, one stage of the overall detection process. A traditional evaluation of several learning algorithms led to unsatisfying and confusing results, which we attributed to skewed data sets and unequal but unknown costs of misclassification. We reapproached the problem using Receiver Operating Characteristic (ROC) analysis, an evaluation methodology that led us to completely different conclusions about which learning algorithms performed best for this detection task. Our findings and the findings of others bring into question the validity of any empirical study using percent correct as the measure of performance.

This is joint work with Pat Langley (ISLE & Stanford), Tom Binford (Stanford), and Ram Nevatia (USC).

Slides from the talk available in PostScript (gzipped) and PDF.