Timezone: »

Useful Confidence Measures: Beyond the Max Score
Gal Yona · Amir Feder · Itay Laish

An important component in deploying machine learning (ML) in safety-critic applications is having a reliable measure of confidence in the ML model's predictions. For a classifier $f$ producing a probability vector $f(x)$ over the candidate classes, the confidence is typically taken to be $\max_i f(x)_i$. This approach is potentially limited, as it disregards the rest of the probability vector. In this work, we derive several confidence measures that depend on information beyond the maximum score, such as margin-based and entropy-based measures, and empirically evaluate their usefulness, focusing on NLP tasks with distribution shifts and Transformer-based models. We show that when models are evaluated on the out-of-distribution data out of the box'', using only the maximum score to inform the confidence measure is highly suboptimal. In the post-processing regime (where the scores of $f$ can be improved using additional in-distribution held-out data), this remains true, albeit less significant. Overall, our results suggest that entropy-based confidence is a surprisingly useful measure.

#### Author Information

##### Amir Feder (Columbia University)

Amir Feder is a Postdoctoral Research Scientist in the Data Science Institute, working with Professor David Blei on causal inference and natural language processing. His research seeks to develop methods that integrate causality into natural language processing, and use them to build linguistically-informed algorithms for predicting and understanding human behavior. Through the paradigm of causal machine learning, Amir aims to build bridges between machine learning and the social sciences. Before joining Columbia, Amir received his PhD from the Technion, where he was advised by Roi Reichart and worked closely with Uri Shalit. In a previous (academic) life, Amir was an economics, statistics and history student at Tel Aviv University, the Hebrew University of Jerusalem and Northwestern University. Amir was the organizer of the First Workshop on Causal Inference and NLP (CI+NLP) at EMNLP 2021.