Poster
Explaining Text Datasets with Language Parameters
Ruiqi Zhong · Heng Wang · Dan Klein · Jacob Steinhardt
West Ballroom A-D #7208
To make sense of massive data, we often first fit simplified models and then interpret the parameters; for example, we cluster the text embeddings and then interpret the mean parameters of each cluster. However, these parameters are often high-dimensional and hard to interpret. To make model parameters directly interpretable, we introduce a family of statistical models---including clustering, time-series, and classification models---parameterized by natural language predicates. For example, a cluster of text about COVID could be directly parameterized by the predicate "discusses COVID". To learn these statistical models effectively, we develop a model-agnostic algorithm that optimizes continuous relaxations of predicate parameters with gradient descent and discretizes them by prompting language models (LM). We demonstrate its potential by applying it to taxonomize real-world user chat dialogue and characterize how they evolve across time, outputting sophisticated explanations that classical methods struggle to generate.
Live content is unavailable. Log in and register to view live content