Timezone: »
We introduce a method to measure uncertainty in large language models.For tasks like question answering, it is essential to know when we can trust the natural language outputs of foundation models.We show that measuring uncertainty in natural language is challenging because of semantic equivalence—different sentences can mean the same thing.To overcome these challenges we introduce semantic entropy—an entropy which incorporates linguistic invariances created by shared meanings.Our method is unsupervised, uses only a single model, and requires no modifications to off-the-shelf language models.In comprehensive ablation studies we show that the semantic entropy is more predictive of model accuracy on question answering data sets than comparable baselines.
Author Information
Lorenz Kuhn (University of Oxford)
Yarin Gal (University of OXford)
Sebastian Farquhar (DeepMind)
More from the Same Authors
-
2020 : Paper 40: Real2sim: Automatic Generation of Open Street Map Towns For Autonomous Driving Benchmarks »
Panagiotis Tigas · Yarin Gal -
2022 : Discovering Long-period Exoplanets using Deep Learning with Citizen Science Labels »
Shreshth A Malik · Nora Eisner · Chris Lintott · Yarin Gal -
2022 : TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction »
Pascal Notin · Lodevicus van Niekerk · Aaron Kollasch · Daniel Ritter · Yarin Gal · Debora Marks -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 : What 'Out-of-distribution' Is and Is Not »
Sebastian Farquhar · Yarin Gal -
2022 : Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? »
Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal -
2022 Poster: Tractable Function-Space Variational Inference in Bayesian Neural Networks »
Tim G. J. Rudner · Zonghao Chen · Yee Whye Teh · Yarin Gal -
2022 Poster: Scalable Sensitivity and Uncertainty Analyses for Causal-Effect Estimates of Continuous-Valued Interventions »
Andrew Jesson · Alyson Douglas · Peter Manshausen · Maëlys Solal · Nicolai Meinshausen · Philip Stier · Yarin Gal · Uri Shalit -
2022 Poster: Interventions, Where and How? Experimental Design for Causal Models at Scale »
Panagiotis Tigas · Yashas Annadani · Andrew Jesson · Bernhard Schölkopf · Yarin Gal · Stefan Bauer -
2022 Poster: Active Surrogate Estimators: An Active Learning Approach to Label-Efficient Model Evaluation »
Jannik Kossen · Sebastian Farquhar · Yarin Gal · Thomas Rainforth -
2021 Workshop: Bayesian Deep Learning »
Yarin Gal · Yingzhen Li · Sebastian Farquhar · Christos Louizos · Eric Nalisnick · Andrew Gordon Wilson · Zoubin Ghahramani · Kevin Murphy · Max Welling -
2021 : Evaluating Approximate Inference in Bayesian Deep Learning + Q&A »
Andrew Gordon Wilson · Pavel Izmailov · Matthew Hoffman · Yarin Gal · Yingzhen Li · Melanie F. Pradier · Sharad Vikram · Andrew Foong · Sanae Lotfi · Sebastian Farquhar -
2018 Poster: BRUNO: A Deep Recurrent Model for Exchangeable Data »
Iryna Korshunova · Jonas Degrave · Ferenc Huszar · Yarin Gal · Arthur Gretton · Joni Dambre