Timezone: »
Controlled generation refers to the problem of creating text that contains stylistic or semantic attributes of interest. Many approaches reduce this problem to building a predictor of the desired attribute.For example, researchers hoping to deploy a large language model to produce non-toxic content may use a toxicity classifier to filter generated text. In this paper, we show that the performance of controlled generation may be poor if the target distribution of text differs from the distribution the predictor was trained on. Instead, we take inspiration from causal representation learning and cast controlled generation under distribution shift as an invariant learning problem: the most effective predictor should be invariant across multiple text environments. Experiments demonstrate the promise and difficulty of adapting invariant learning methods, which have been primarily developed for vision, to text.
Author Information
Claudia Shi (Columbia University)
Carolina Zheng (Columbia University)
Keyon Vafa (Columbia University)
Amir Feder (Columbia University)
Amir Feder is a Postdoctoral Research Scientist in the Data Science Institute, working with Professor David Blei on causal inference and natural language processing. His research seeks to develop methods that integrate causality into natural language processing, and use them to build linguistically-informed algorithms for predicting and understanding human behavior. Through the paradigm of causal machine learning, Amir aims to build bridges between machine learning and the social sciences. Before joining Columbia, Amir received his PhD from the Technion, where he was advised by Roi Reichart and worked closely with Uri Shalit. In a previous (academic) life, Amir was an economics, statistics and history student at Tel Aviv University, the Hebrew University of Jerusalem and Northwestern University. Amir was the organizer of the First Workshop on Causal Inference and NLP (CI+NLP) at EMNLP 2021.
David Blei (Columbia University)
David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013). He is a fellow of the ACM.
Related Events (a corresponding poster, oral, or spotlight)
-
2022 : An Invariant Learning Characterization of Controlled Text Generation »
Dates n/a. Room
More from the Same Authors
-
2021 : Modeling Worker Career Trajectories with Neural Sequence Models »
Keyon Vafa -
2021 : Unveiling Mode-connectivity of the ELBO Landscape »
Edith Zhang · David Blei -
2022 : A Bayesian Causal Inference Approach for Assessing Fairness in Clinical Decision-Making »
Linying Zhang · Lauren Richter · Yixin Wang · Anna Ostropolets · Noemie Elhadad · David Blei · George Hripcsak -
2022 : Adjusting the Gender Wage Gap with a Low-Dimensional Representation of Job History »
Keyon Vafa · Susan Athey · David Blei -
2022 : CAREER: Economic Prediction of Labor Sequence Data Under Distribution Shift »
Keyon Vafa · Emil Palikot · Tianyu Du · Ayush Kanodia · Susan Athey · David Blei -
2022 : Useful Confidence Measures: Beyond the Max Score »
Gal Yona · Amir Feder · Itay Laish -
2022 : An Invariant Learning Characterization of Controlled Text Generation »
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei -
2022 : CAREER: Economic Prediction of Labor Sequence Data Under Distribution Shift »
Keyon Vafa · Emil Palikot · Tianyu Du · Ayush Kanodia · Susan Athey · David Blei -
2022 Poster: CEBaB: Estimating the Causal Effects of Real-World Concepts on NLP Model Behavior »
Eldar D Abraham · Karel D'Oosterlinck · Amir Feder · Yair Gat · Atticus Geiger · Christopher Potts · Roi Reichart · Zhengxuan Wu -
2022 Poster: In the Eye of the Beholder: Robust Prediction with Causal User Modeling »
Amir Feder · Guy Horowitz · Yoav Wald · Roi Reichart · Nir Rosenfeld -
2021 : David Blei - On the Assumptions of Synthetic Control Methods »
David Blei -
2021 Test Of Time: Online Learning for Latent Dirichlet Allocation »
Matthew Hoffman · Francis Bach · David Blei -
2021 Poster: Posterior Collapse and Latent Variable Non-identifiability »
Yixin Wang · David Blei · John Cunningham -
2021 Poster: On Calibration and Out-of-Domain Generalization »
Yoav Wald · Amir Feder · Daniel Greenfeld · Uri Shalit -
2020 Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning »
Jessica Forde · Francisco Ruiz · Melanie Fernandez Pradier · Aaron Schein · Finale Doshi-Velez · Isabel Valera · David Blei · Hanna Wallach -
2020 Poster: Markovian Score Climbing: Variational Inference with KL(p||q) »
Christian Naesseth · Fredrik Lindsten · David Blei -
2019 Poster: Discrete Flows: Invertible Generative Models of Discrete Data »
Dustin Tran · Keyon Vafa · Kumar Agrawal · Laurent Dinh · Ben Poole -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2019 Poster: Variational Bayes under Model Misspecification »
Yixin Wang · David Blei -
2019 Poster: Using Embeddings to Correct for Unobserved Confounding in Networks »
Victor Veitch · Yixin Wang · David Blei -
2019 Poster: Adapting Neural Networks for the Estimation of Treatment Effects »
Claudia Shi · David Blei · Victor Veitch -
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona -
2018 : The Blessings of Multiple Causes »
David Blei -
2017 : Panel: On the Foundations and Future of Approximate Inference »
David Blei · Zoubin Ghahramani · Katherine Heller · Tim Salimans · Max Welling · Matthew D. Hoffman -
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias -
2017 Poster: Hierarchical Implicit Models and Likelihood-Free Variational Inference »
Dustin Tran · Rajesh Ranganath · David Blei -
2017 Poster: Structured Embedding Models for Grouped Data »
Maja Rudolph · Francisco Ruiz · Susan Athey · David Blei -
2017 Poster: Variational Inference via $\chi$ Upper Bound Minimization »
Adji Bousso Dieng · Dustin Tran · Rajesh Ranganath · John Paisley · David Blei -
2017 Poster: Context Selection for Embedding Models »
Liping Liu · Francisco Ruiz · Susan Athey · David Blei -
2016 : Causal Inference for Recommendation Systems »
David Blei -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 : Deep exponential families »
David Blei -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Poster: Operator Variational Inference »
Rajesh Ranganath · Dustin Tran · Jaan Altosaar · David Blei -
2016 Poster: The Generalized Reparameterization Gradient »
Francisco Ruiz · Michalis Titsias · David Blei -
2016 Poster: Exponential Family Embeddings »
Maja Rudolph · Francisco Ruiz · Stephan Mandt · David Blei -
2016 Tutorial: Variational Inference: Foundations and Modern Methods »
David Blei · Shakir Mohamed · Rajesh Ranganath -
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei -
2015 Poster: The Population Posterior and Bayesian Modeling on Streams »
James McInerney · Rajesh Ranganath · David Blei -
2015 Poster: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei -
2015 Spotlight: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei -
2015 Poster: Copula variational inference »
Dustin Tran · David Blei · Edo M Airoldi