Timezone: »

Consistent Binary Classification with Generalized Performance Metrics
Sanmi Koyejo · Nagarajan Natarajan · Pradeep Ravikumar · Inderjit Dhillon

Wed Dec 10 04:00 PM -- 08:59 PM (PST) @ Level 2, room 210D

Performance metrics for binary classification are designed to capture tradeoffs between four fundamental population quantities: true positives, false positives, true negatives and false negatives. Despite significant interest from theoretical and applied communities, little is known about either optimal classifiers or consistent algorithms for optimizing binary classification performance metrics beyond a few special cases. We consider a fairly large family of performance metrics given by ratios of linear combinations of the four fundamental population quantities. This family includes many well known binary classification metrics such as classification accuracy, AM measure, F-measure and the Jaccard similarity coefficient as special cases. Our analysis identifies the optimal classifiers as the sign of the thresholded conditional probability of the positive class, with a performance metric-dependent threshold. The optimal threshold can be constructed using simple plug-in estimators when the performance metric is a linear combination of the population quantities, but alternative techniques are required for the general case. We propose two algorithms for estimating the optimal classifiers, and prove their statistical consistency. Both algorithms are straightforward modifications of standard approaches to address the key challenge of optimal threshold selection, thus are simple to implement in practice. The first algorithm combines a plug-in estimate of the conditional probability of the positive class with optimal threshold selection. The second algorithm leverages recent work on calibrated asymmetric surrogate losses to construct candidate classifiers. We present empirical comparisons between these algorithms on benchmark datasets.

Author Information

Sanmi Koyejo (University of Illinois at Urbana-Champaign & Google Research)

Sanmi (Oluwasanmi) Koyejo an Assistant Professor in the Department of Computer Science at the University of Illinois at Urbana-Champaign. Koyejo's research interests are in the development and analysis of probabilistic and statistical machine learning techniques motivated by, and applied to various modern big data problems. He is particularly interested in the analysis of large scale neuroimaging data. Koyejo completed his Ph.D in Electrical Engineering at the University of Texas at Austin advised by Joydeep Ghosh, and completed postdoctoral research at Stanford University with a focus on developing Machine learning techniques for neuroimaging data. His postdoctoral research was primarily with Russell A. Poldrack and Pradeep Ravikumar. Koyejo has been the recipient of several awards including the outstanding NCE/ECE student award, a best student paper award from the conference on uncertainty in artificial intelligence (UAI) and a trainee award from the Organization for Human Brain Mapping (OHBM).

Nagarajan Natarajan (Microsoft Research, India)
Pradeep Ravikumar (Carnegie Mellon University)
Inderjit Dhillon (UT Austin & Amazon)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors