A machine learning researcher's foray into recidivism prediction

Marcus A. Maloof

We discuss an application of machine learning to recidivism prediction. Our initial results motivate the need for a methodology for technique selection for applications that involve unequal but unknown error costs, a skewed data set, or both. Evaluation methodologies traditionally used in machine learning are inadequate for analyzing performance in these situations, although they arise frequently when addressing real-world problems. After discussing the problem of recidivism prediction and the particulars of our data set, we present experimental results that motivate the need to evaluate learning algorithm over a range of error costs. We then describe Receiver Operating Characteristic (ROC) analysis, which has been used extensively in signal detection theory for decades but has only recently begun to filter into machine learning research. With this new perspective, we revisit the recidivism prediction task and present results that contradict those obtained using a traditional method of evaluation.

Paper available in PostScript (gzipped) and PDF.

Slides from the talk available in PostScript (gzipped) and PDF.

@techreport{maloof.tr.99.2,
  author = "Maloof, M.A.",
  title = "A machine learning researcher's foray into recidivism prediction",
  type = "Technical Report",
  number = "CS-99-02",
  month = "July",
  year = 1999,
  institution = "Department of Computer Science, Georgetown University",
  address = "Wasington, DC",
}