Skip to yearly menu bar Skip to main content


Poster

ContextCite: Attributing Model Generation to Context

Benjamin Cohen-Wang · Harshay Shah · Kristian Georgiev · Aleksander Madry

East Exhibit Hall A-C #3407
[ ] [ Project Page ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

How do language models use information provided as context when generating a response?Can we infer whether a particular generated statement is actually grounded in the context, a misinterpretation, or fabricated?To help answer these questions, we introduce the problem of context attribution: pinpointing the parts of the context (if any) that led a model to generate a particular statement.We then present ContextCite, a simple and scalable method for context attribution that can be applied on top of any existing language model.Finally, we showcase the utility of ContextCite through three applications:(1) helping verify generated statements(2) improving response quality by pruning the context and(3) detecting poisoning attacks.We provide code for ContextCite at https://github.com/MadryLab/context-cite.

Live content is unavailable. Log in and register to view live content