Timezone: »

LMdiff: A Visual Diff Tool to Compare LanguageModels
Hendrik Strobelt · Benjamin Hoover · Arvind Satyanarayan · Sebastian Gehrmann

Tue Dec 08 06:00 PM -- 06:20 PM & Wed Dec 09 06:00 PM -- 06:20 PM (PST) @
Event URL: http://difflm.xyz/ »

Recently, large language models (LM) have been shown to sample mostly coherent long-form text. This astonishing level of fluency has driven an increasing interest to understand how these models work and, in particular, how to interpret and evaluate them. Additionally, the growing use of sophisticated LM frameworks has lowered the threshold for users to train newmodels or to fine-tune existing models for transfer learning. However, selecting the best LM from the expanding selection of pre-trained deep LM architectures is challenging, as there are few tools available to qualitatively compare models for specialized use-cases, e.g. to answer questions like: "What parts of a domain specific text can the fine-tuned model capture better than the general model?"

We introduce LMdiff: an interactive visual analysis tool for comparing LMs by qualitatively inspecting concrete samples generated by another model or drawn from a reference corpus. We provide an offline method to search for interesting samples, a live demo, and source code for the demo session that supports multiple models and allows users to upload their own example text.

Author Information

Hendrik Strobelt (IBM Research)
Benjamin Hoover (IBM Research)
Arvind Satyanarayan (MIT CSAIL)
Sebastian Gehrmann (Harvard University)

More from the Same Authors