Skip to yearly menu bar Skip to main content


LMdiff: A Visual Diff Tool to Compare LanguageModels

Hendrik Strobelt · Benjamin Hoover · Arvind Satyanarayan · Sebastian Gehrmann


Recently, large language models (LM) have been shown to sample mostly coherent long-form text. This astonishing level of fluency has driven an increasing interest to understand how these models work and, in particular, how to interpret and evaluate them. Additionally, the growing use of sophisticated LM frameworks has lowered the threshold for users to train newmodels or to fine-tune existing models for transfer learning. However, selecting the best LM from the expanding selection of pre-trained deep LM architectures is challenging, as there are few tools available to qualitatively compare models for specialized use-cases, e.g. to answer questions like: "What parts of a domain specific text can the fine-tuned model capture better than the general model?"

We introduce LMdiff: an interactive visual analysis tool for comparing LMs by qualitatively inspecting concrete samples generated by another model or drawn from a reference corpus. We provide an offline method to search for interesting samples, a live demo, and source code for the demo session that supports multiple models and allows users to upload their own example text.

Chat is not available.