Timezone: »
Recently, large language models (LM) have been shown to sample mostly coherent long-form text. This astonishing level of fluency has driven an increasing interest to understand how these models work and, in particular, how to interpret and evaluate them. Additionally, the growing use of sophisticated LM frameworks has lowered the threshold for users to train newmodels or to fine-tune existing models for transfer learning. However, selecting the best LM from the expanding selection of pre-trained deep LM architectures is challenging, as there are few tools available to qualitatively compare models for specialized use-cases, e.g. to answer questions like: "What parts of a domain specific text can the fine-tuned model capture better than the general model?"
We introduce LMdiff: an interactive visual analysis tool for comparing LMs by qualitatively inspecting concrete samples generated by another model or drawn from a reference corpus. We provide an offline method to search for interesting samples, a live demo, and source code for the demo session that supports multiple models and allows users to upload their own example text.
Author Information
Hendrik Strobelt (IBM Research)
Benjamin Hoover (IBM Research)
Arvind Satyanarayan (MIT CSAIL)
Sebastian Gehrmann (Harvard University)
More from the Same Authors
-
2021 : Automatic Construction of Evaluation Suites for Natural Language Generation Datasets »
Simon Mille · Kaustubh Dhole · Saad Mahamood · Laura Perez-Beltrachini · Varun Prashant Gangal · Mihir Kale · Emiel van Miltenburg · Sebastian Gehrmann -
2021 : SynthBio: A Case Study in Faster Curation of Text Datasets »
Ann Yuan · Daphne Ippolito · Vitaly Nikolaev · Chris Callison-Burch · Andy Coenen · Sebastian Gehrmann -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2023 Poster: Energy-based Attention for Associative Memory »
Benjamin Hoover · Yuchen Liang · Bao Pham · Rameswar Panda · Hendrik Strobelt · Duen Horng Chau · Mohammed Zaki · Dmitry Krotov -
2022 : A Universal Abstraction for Hierarchical Hopfield Networks »
Benjamin Hoover · Duen Horng Chau · Hendrik Strobelt · Dmitry Krotov -
2021 : Interactive Exploration for 60 Years of AI Research »
Hendrik Strobelt · Benjamin Hoover -
2020 Poster: CogMol: Target-Specific and Selective Drug Design for COVID-19 Using Deep Generative Models »
Vijil Chenthamarakshan · Payel Das · Samuel Hoffman · Hendrik Strobelt · Inkit Padhi · Kar Wai Lim · Benjamin Hoover · Matteo Manica · Jannis Born · Teodoro Laino · Aleksandra Mojsilovic -
2020 Demonstration: Shared Interest: Human Annotations vs. AI Saliency »
Angie Boggust · Benjamin Hoover · Arvind Satyanarayan · Hendrik Strobelt -
2020 Poster: Investigating Gender Bias in Language Models Using Causal Mediation Analysis »
Jesse Vig · Sebastian Gehrmann · Yonatan Belinkov · Sharon Qian · Daniel Nevo · Yaron Singer · Stuart Shieber -
2020 Spotlight: Investigating Gender Bias in Language Models Using Causal Mediation Analysis »
Jesse Vig · Sebastian Gehrmann · Yonatan Belinkov · Sharon Qian · Daniel Nevo · Yaron Singer · Stuart Shieber -
2019 Demonstration: exBERT: A Visual Analysis Tool to Explain BERT's Learned Representations »
Benjamin Hoover · Hendrik Strobelt · Sebastian Gehrmann