Skip to yearly menu bar Skip to main content


Poster

LACIE: Listener-Aware Finetuning for Calibration in Large Language Models

Elias Stengel-Eskin · Peter Hase · Mohit Bansal

East Exhibit Hall A-C #4503
[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

When answering questions, large language models (LLMs) can convey not only an answer to the question, but a level of confidence about the answer being correct. This includes explicit markers of confidence (e.g. giving a numeric confidence score) as well as implicit markers, like using an authoritative tone or elaborating with additional knowledge of a subject. For LLMs to be trustworthy sources of knowledge, the confidence they convey should match their actual expertise on a topic; however, this is currently not the case, with most models tending towards overconfidence. To calibrate both implicit and explicit confidence markers, we introduce a pragmatic, listener-aware finetuning method (LACIE) that directly models the listener, considering not only whether an answer is right, but whether it will be accepted by a listener. Specifically, we cast calibration as a preference optimization problem, creating data via a two-player speaker-listener game, where a speaker model’s outputs are judged by a simulated listener. We then finetune three different LLMs (Mistral-7B, Llama3-8B, Llama3-70B) with LACIE, and show that the resulting models are better calibrated on TriviaQA with respect to a simulated listener. Crucially, these trends tranfer to human listeners, helping them correctly estimate model correctness: we conduct a human evaluation where annotators accept or reject an LLM’s answers to trivia questions, finding that training with LACIE results in 47% fewer incorrect answers being accepted while maintaing acceptance of correct answers. Furthermore, LACIE generalizes to another dataset, resulting in a large increase in truthfulness on TruthfulQA when trained on TriviaQA. Our analysis indicates that LACIE leads to a better separation in confidence between correct and incorrect examples. Qualitatively, we find that a LACIE-trained model hedges more when uncertain and adopts implicit cues to signal certainty when it is correct, such as using an authoritative tone or including details. Finally, finetuning with our listener-aware method leads to an emergent increase in model abstention (e.g. saying “I don’t know”) for answers that are likely to be wrong. We include our code as a supplement.

Live content is unavailable. Log in and register to view live content