In our work in data mining and
machine learning, we often face
domains in which class distributions are greatly skewed and/or
classification error costs are unequal. In these situations, evaluating
classifiers is very difficult because classification accuracy, the
metric by which most evaluation is currently done, is completely
inadequate. To make things worse, class distributions in these domains
often drift over time, and error costs may be known only approximately.
We've developed a robust framework for evaluating learned classifiers, based on ROC analysis, which enables us to analyze and visualize classification performance separately from assumptions about class distributions and error costs. The method computes the ROC convex hull and allows us to:

Here is an online talk (Powerpoint) on the ROC Convex Hull based on a presentation I gave at HP Labs. The slides are taken from various talks by Foster and me, and include a few from Foster's data mining evaluation tutorials. I've tried to get the web version of the talk to look like the real slides, but some of the characters are screwed up (notably, double quotes are question marks in some places).
Rob Holte also has a talk on ROC Convex Hull for Comparing Machine Learning Schemes.
A webpage on ROCCH DET curves, including a MATLAB toolkit, by Niko Brummer.
A small script to generate ROC points from a C4.5 tree.
These papers describe the ROC Convex Hull, its uses and its properties. The first paper listed is the most complete. The best basic introduction is probably the KDD97 paper.
Abstract: In realworld
environments it usually is difficult to specify target operating
conditions precisely, for example, target misclassification costs. This
uncertainty makes building robust classification systems problematic.
We show that it is possible to build a hybrid classifier that will
perform at least as well as the best available classifier for any
target conditions. In some cases, the performance of the hybrid
actually can surpass that of the best known classifier. This robust
performance extends across a wide variety of comparison frameworks,
including the optimization of metrics such as accuracy, expected cost,
lift, precision, recall, and workforce utilization. The hybrid also is
efficient to build, to store, and to update. The hybrid is based on a
method for the comparison of classifier performance that is robust to
imprecise class distributions and misclassification costs. The ROC
convex hull (ROCCH) method combines techniques from ROC analysis,
decision analysis and computational geometry, and adapts them to the
particulars of analyzing learned classifiers. The method is efficient
and incremental, minimizes the management of classifier performance
data, and allows for clear visual comparisons and sensitivity analyses.
Finally, we point to empirical evidence that a robust hybrid classifier
indeed is needed for many realworld problems.
Postscript version, PDF version.
Abstract: Applications of inductive learning algorithms to realworld data mining problems have shown repeatedly that using accuracy to compare classifiers is not adequate because the underlying assumptions rarely hold. We present a method for the comparison of classifier performance that is robust to imprecise class distributions and misclassification costs. The ROC convex hull method combines techniques from ROC analysis, decision analysis and computational geometry, and adapts them to the particulars of analyzing learned classifiers. The method is efficient and incremental, minimizes the management of classifier performance data, and allows for clear visual comparisons and sensitivity analyses.
Postscript version, PDF version.
(Though this paper is not about the ROC convex hull, it points out problems with using accuracy for evaluating classifiers and it motivates the need for ROC analysis.)
Abstract: We analyze critically the use of classification accuracy to compare classifiers on natural data sets, providing a thorough investigation using ROC analysis, standard machine learning algorithms, and standard benchmark data sets. The results raise serious concerns about the use of accuracy for comparing classifiers and draw into question the conclusions that can be drawn from such studies. In the course of the presentation, we describe and demonstrate what we believe to be the proper use of ROC analysis for comparative studies in machine learning research. We argue that this methodology is preferable both for making practical choices and for drawing scientific conclusions.
Postscript version, PDF version.
Abstract: In realworld environments, it is usually difficult to specify target operating conditions precisely. This uncertainty makes building robust classification systems problematic. We show that it is possible to build a hybrid classifier that will perform at least as well as the best available classifier for any target conditions. This robust performance extends across a wide variety of comparison frameworks, including the optimization of metrics such as accuracy, expected cost, lift, precision, recall, and workforce utilization. In some cases, the performance of the hybrid can actually surpass that of the best known classifier. The hybrid is also efficient to build, to store, and to update. Finally, we provide empirical evidence that a robust hybrid classifier is needed for many realworld problems.
Other people's work building upon, or relevant to, the ROC Convex Hull.
Every day, important and complex yesorno diagnostic decisions are made throughout medicine, industry and society. Statistical methods of making those choices could dramatically improve the outcomes.A fairly wellwritten, accessible introduction to ROC curves and their use in decision making.
Abstract: The comparative study of classifier performance is a worthwhile concern in Machine Learning. Empirical comparisons typically examine unbiased estimates of predictive accuracy of different algorithms  the assumption being that the classifier with the highest accuracy would be the ``optimal'' choice of classifier for the problem. The qualification on optimality is needed here, as choice is restricted to the classifiers being compared, and the estimates are typically subject to sampling errors. Comparisons based on predictive accuracy overlook two important practical concerns, namely (a) class distributions cannot be specified precisely. Distribution of classes in the training set are thus rarely matched exactly on new data; and (b) that the costs of different types of errors may be unequal. Using techniques developed in signal detection, Provost and Fawcett describe an elegant method for the comparative assessment of binary classifiers that takes these considerations into account. Their principal result is that optimal classifiers lie on the edge of the convex hull of points representing the classifier performance in a Lorentz diagram. The authors leave open the question of whether this result extends to classifiers that discriminate between more than two classes. Here we show that the result that optimal classfiers are located on the edge of the convex hull does extend to arbitrary number of classes. This follows directly from results about convex sets that form the basis of linear programming algorithms.
Abstract: This paper investigates how the splitting criteria and pruning methods of decision tree learning algorithms are influenced by misclassification costs or changes to the class distribution. Splitting criteria that are relatively insensitive to costs (class distributions) are found to perform as well as or better than, in terms of expected misclassification cost, splitting criteria that are cost sensitive. Consequently there are two opposite ways of dealing with imbalance. One is to combine a cost insensitive splitting criterion with a cost insensitive pruning method to produce a decision tree algorithm little affected by cost or prior class distribution. The other is to grow a costindependent tree which is then pruned in a costsensitive manner.
Abstract: This paper proposes an alternative to ROC representation, in which the expected cost of a classifier is represented explicitly. This expected cost representation maintains many of the advantages of ROC representation, but is easier to understand. It allows the experimenter to immediately see the range of costs and class frequencies where a particular classifier is the best and quantitatively how much better it is than other classifiers. This paper demonstrates there is a point/line duality between the two representations. A point in ROC space representing a classifier becomes a line segment spanning the full range of costs and class frequencies. This duality produces equivalent operations in the two spaces, allowing techniques used in ROC analysis to be readily reproduced in the cost space.
Abstract: Abstract. We have been developing webbased intelligent navigation system, "mondou", using mining association rules. It is very helpful for search users to provide associative keywords which are derived by mining algorithm. At present, based on the experimental results of mining algorithms, we try to extend the search functions of the commercial text database "Open Text". However, in order to derive appropriate keywords with small computing cost, we have to carefully determine several system parameters, such as Minsup and Minconf threshold values. In this paper, we focus on the techniques of ROC graph to evaluate the characteristics of derived rules. We propose the ROC analytical model of our search system, and evaluate the performance of extended association rules. Finally, by using the ROC convex hull method, we try to specify the optimal threshold values to derive effective rules from the typical bibliographic data, INSPEC database.
Kelly Zou maintains an
impressive bibliography of ROC papers and articles, mostly
from
the medical diagnostic literature.