Timezone: »
Word embeddings are a powerful approach to capturing semantic similarity among terms in a vocabulary. In this paper, we develop exponential family embeddings, which extends the idea of word embeddings to other types of high-dimensional data. As examples, we studied several types of data: neural data with real-valued observations, count data from a market basket analysis, and ratings data from a movie recommendation system. The main idea is that each observation is modeled conditioned on a set of latent embeddings and other observations, called the context, where the way the context is defined depends on the problem. In language the context is the surrounding words; in neuroscience the context is close-by neurons; in market basket data the context is other items in the shopping cart. Each instance of an embedding defines the context, the exponential family of conditional distributions, and how the embedding vectors are shared across data. We infer the embeddings with stochastic gradient descent, with an algorithm that connects closely to generalized linear models. On all three of our applications—neural activity of zebrafish, users’ shopping behavior, and movie ratings—we found that exponential family embedding models are more effective than other dimension reduction methods. They better reconstruct held-out data and find interesting qualitative structure.
Author Information
Maja Rudolph (Columbia University)
Francisco Ruiz (Columbia University)
Stephan Mandt (Disney Research)

Stephan Mandt is an Associate Professor of Computer Science and Statistics at the University of California, Irvine. From 2016 until 2018, he was a Senior Researcher and Head of the statistical machine learning group at Disney Research in Pittsburgh and Los Angeles. He held previous postdoctoral positions at Columbia University and Princeton University. Stephan holds a Ph.D. in Theoretical Physics from the University of Cologne, where he received the German National Merit Scholarship. He is furthermore a recipient of the NSF CAREER Award, the UCI ICS Mid-Career Excellence in Research Award, the German Research Foundation's Mercator Fellowship, a Kavli Fellow of the U.S. National Academy of Sciences, a member of the ELLIS Society, and a former visiting researcher at Google Brain. Stephan regularly serves as an Area Chair, Action Editor, or Editorial Board member for NeurIPS, ICML, AAAI, ICLR, TMLR, and JMLR. His research is currently supported by NSF, DARPA, DOE, Disney, Intel, and Qualcomm.
David Blei (Columbia University)
David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACM-Infosys Foundation Award (2013). He is a fellow of the ACM.
More from the Same Authors
-
2021 : Analyzing High-Resolution Clouds and Convection using Multi-Channel VAEs »
Harshini Mangipudi · Griffin Mooers · Mike Pritchard · Tom Beucler · Stephan Mandt -
2021 : Structured Stochastic Gradient MCMC: a hybrid VI and MCMC approach »
Antonios Alexos · Alex Boyd · Stephan Mandt -
2021 : Unveiling Mode-connectivity of the ELBO Landscape »
Edith Zhang · David Blei -
2022 : Probabilistic Querying of Continuous-Time Sequential Events »
Alex Boyd · Yuxin Chang · Stephan Mandt · Padhraic Smyth -
2022 : An Invariant Learning Characterization of Controlled Text Generation »
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei -
2022 : A Bayesian Causal Inference Approach for Assessing Fairness in Clinical Decision-Making »
Linying Zhang · Lauren Richter · Yixin Wang · Anna Ostropolets · Noemie Elhadad · David Blei · George Hripcsak -
2022 : Adjusting the Gender Wage Gap with a Low-Dimensional Representation of Job History »
Keyon Vafa · Susan Athey · David Blei -
2022 : An Unsupervised Learning Perspective on the Dynamic Contribution to Extreme Precipitation Changes »
Griffin Mooers · Tom Beucler · Mike Pritchard · Stephan Mandt -
2022 : CAREER: Economic Prediction of Labor Sequence Data Under Distribution Shift »
Keyon Vafa · Emil Palikot · Tianyu Du · Ayush Kanodia · Susan Athey · David Blei -
2022 : An Invariant Learning Characterization of Controlled Text Generation »
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei -
2023 Workshop: Deep Generative Models for Health »
Emanuele Palumbo · Laura Manduchi · Sonia Laguna · Melanie F. Pradier · Vincent Fortuin · Stephan Mandt · Julia Vogt -
2022 : Q & A »
Karen Ullrich · Yibo Yang · Stephan Mandt -
2022 Tutorial: Data Compression with Machine Learning »
Karen Ullrich · Yibo Yang · Stephan Mandt -
2022 : Tutorial part 1 »
Yibo Yang · Karen Ullrich · Stephan Mandt -
2022 : CAREER: Economic Prediction of Labor Sequence Data Under Distribution Shift »
Keyon Vafa · Emil Palikot · Tianyu Du · Ayush Kanodia · Susan Athey · David Blei -
2022 : An Invariant Learning Characterization of Controlled Text Generation »
Claudia Shi · Carolina Zheng · Keyon Vafa · Amir Feder · David Blei -
2022 Poster: Predictive Querying for Autoregressive Neural Sequence Models »
Alex Boyd · Samuel Showalter · Stephan Mandt · Padhraic Smyth -
2021 : David Blei - On the Assumptions of Synthetic Control Methods »
David Blei -
2021 Test Of Time: Online Learning for Latent Dirichlet Allocation »
Matthew Hoffman · Francis Bach · David Blei -
2021 Poster: Detecting and Adapting to Irregular Distribution Shifts in Bayesian Online Learning »
Aodong Li · Alex Boyd · Padhraic Smyth · Stephan Mandt -
2021 Poster: Posterior Collapse and Latent Variable Non-identifiability »
Yixin Wang · David Blei · John Cunningham -
2020 : Q/A and Discussion for ML Theory Session »
Karthik Kashinath · Mayur Mudigonda · Stephan Mandt · Rose Yu -
2020 : Stephan Mandt »
Stephan Mandt -
2020 Workshop: I Can’t Believe It’s Not Better! Bridging the gap between theory and empiricism in probabilistic machine learning »
Jessica Forde · Francisco Ruiz · Melanie Fernandez Pradier · Aaron Schein · Finale Doshi-Velez · Isabel Valera · David Blei · Hanna Wallach -
2020 Poster: Markovian Score Climbing: Variational Inference with KL(p||q) »
Christian Naesseth · Fredrik Lindsten · David Blei -
2020 Poster: User-Dependent Neural Sequence Models for Continuous-Time Event Data »
Alex Boyd · Robert Bamler · Stephan Mandt · Padhraic Smyth -
2020 Poster: Improving Inference for Neural Image Compression »
Yibo Yang · Robert Bamler · Stephan Mandt -
2019 Poster: Deep Generative Video Compression »
Salvator Lombardo · JUN HAN · Christopher Schroers · Stephan Mandt -
2019 Poster: Poisson-Randomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach -
2019 Poster: Variational Bayes under Model Misspecification »
Yixin Wang · David Blei -
2019 Poster: Using Embeddings to Correct for Unobserved Confounding in Networks »
Victor Veitch · Yixin Wang · David Blei -
2019 Poster: Adapting Neural Networks for the Estimation of Treatment Effects »
Claudia Shi · David Blei · Victor Veitch -
2018 : Datasets and Benchmarks for Causal Learning »
Csaba Szepesvari · Isabelle Guyon · Nicolai Meinshausen · David Blei · Elias Bareinboim · Bernhard Schölkopf · Pietro Perona -
2018 : The Blessings of Multiple Causes »
David Blei -
2017 : Panel: On the Foundations and Future of Approximate Inference »
David Blei · Zoubin Ghahramani · Katherine Heller · Tim Salimans · Max Welling · Matthew D. Hoffman -
2017 : Introduction »
Cheng Zhang · Francisco Ruiz · Dustin Tran · James McInerney · Stephan Mandt -
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias -
2017 Poster: Perturbative Black Box Variational Inference »
Robert Bamler · Cheng Zhang · Manfred Opper · Stephan Mandt -
2017 Poster: Hierarchical Implicit Models and Likelihood-Free Variational Inference »
Dustin Tran · Rajesh Ranganath · David Blei -
2017 Poster: Structured Embedding Models for Grouped Data »
Maja Rudolph · Francisco Ruiz · Susan Athey · David Blei -
2017 Poster: Variational Inference via $\chi$ Upper Bound Minimization »
Adji Bousso Dieng · Dustin Tran · Rajesh Ranganath · John Paisley · David Blei -
2017 Poster: Context Selection for Embedding Models »
Liping Liu · Francisco Ruiz · Susan Athey · David Blei -
2016 : Causal Inference for Recommendation Systems »
David Blei -
2016 : Panel Discussion »
Shakir Mohamed · David Blei · Ryan Adams · José Miguel Hernández-Lobato · Ian Goodfellow · Yarin Gal -
2016 : Deep exponential families »
David Blei -
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan -
2016 Poster: Operator Variational Inference »
Rajesh Ranganath · Dustin Tran · Jaan Altosaar · David Blei -
2016 Poster: The Generalized Reparameterization Gradient »
Francisco Ruiz · Michalis Titsias · David Blei -
2016 Tutorial: Variational Inference: Foundations and Modern Methods »
David Blei · Shakir Mohamed · Rajesh Ranganath -
2015 : Finding Sparse Features in Strongly Confounded Medial Data »
Stephan Mandt · Florian Wenzel -
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei -
2015 Poster: The Population Posterior and Bayesian Modeling on Streams »
James McInerney · Rajesh Ranganath · David Blei -
2015 Poster: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei -
2015 Spotlight: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei -
2015 Poster: Infinite Factorial Dynamical Model »
Isabel Valera · Francisco Ruiz · Lennart Svensson · Fernando Perez-Cruz -
2015 Poster: Copula variational inference »
Dustin Tran · David Blei · Edo M Airoldi -
2014 Poster: Smoothed Gradients for Stochastic Variational Inference »
Stephan Mandt · David Blei -
2012 Poster: Bayesian Nonparametric Modeling of Suicide Attempts »
Francisco Ruiz · Isabel Valera · Carlos Blanco · Fernando Perez-Cruz -
2012 Spotlight: Bayesian Nonparametric Modeling of Suicide Attempts »
Francisco Ruiz · Isabel Valera · Carlos Blanco · Fernando Perez-Cruz