Timezone: »
We’ve all been there. A creative spark leads to a beautiful idea. We love the idea, we nurture it, and name it. The idea is elegant: all who hear it fawn over it. The idea is justified: all of the literature we have read supports it. But, lo and behold: once we sit down to implement the idea, it doesn’t work. We check our code for software bugs. We rederive our derivations. We try again and still, it doesn’t work. We Can’t Believe It’s Not Better [1].
In this workshop, we will encourage probabilistic machine learning researchers who Can’t Believe It’s Not Better to share their beautiful idea, tell us why it should work, and hypothesize why it does not in practice. We also welcome work that highlights pathologies or unexpected behaviors in wellestablished practices. This workshop will stress the quality and thoroughness of the scientific procedure, promoting transparency, deeper understanding, and more principled science.
Focusing on the probabilistic machine learning community will facilitate this endeavor, not only by gathering experts that speak the same language, but also by exploiting the modularity of probabilistic framework. Probabilistic machine learning separates modeling assumptions, inference, and model checking into distinct phases [2]; this facilitates criticism when the final outcome does not meet prior expectations. We aim to create an openminded and diverse space for researchers to share unexpected or negative results and help one another improve their ideas.
Sat 4:45 a.m.  5:00 a.m.

Intro
(Welcome Intro)

Aaron Schein, Melanie F. Pradier 
Sat 5:00 a.m.  5:30 a.m.

Invited Talk: Max Welling  The LIAR (Learning with Interval Arithmetic Regularization) is Dead
(Talk)
»
Two years ago we embarked on a project called LIAR. LIAR was going to quantify uncertainty of a network through interval arithmetic (IA) calculations (which are an official IEEE standard). IA has the beautiful property that the answer of your computation is guaranteed to lie in a computed interval, and as such quantifies very precisely the numerical precision of your computation. Captured by this elegant idea we applied this to neural networks. In particular, the idea was to add a regularization term to the objective that would try to keep the interval of the network’s output small. This is particularly interesting in the context of quantization, where we quite naturally have intervals for the weights, activations and inputs due to their limited precision. By training a full precision neural network with intervals that represent the quantization error, and by encouraging the network to keep the resultant variation in the predictions small, we hoped to learn networks that were inherently robust to quantization noise. So far the good news. In this talk I will try to reconstruct the process of how the project ended up on the scrap pile. I will also try to produce some “lessons learned” from this project and hopefully deliver some advice for those who are going through a similar situation. I still can’t believe it didn’t work better ;) 
Max Welling 
Sat 5:30 a.m.  6:00 a.m.

Invited Talk: Danielle Belgrave  Machine Learning for Personalised Healthcare: Why is it not better?
(Talk)
»
SlidesLive Video »
This talk presents an overview of probabilistic graphical modelling as a strategy for understanding heterogeneous subgroups of patients. The identification of such subgroups may elucidate underlying causal mechanisms which may lead to more targeted treatment and intervention strategies. We will look at (1) the ideal of personalisation within the context of machine learning for healthcare (2) “From the ideal to the reality” and (3) some of the possible pathways to progress for making the ideal of personalised healthcare to reality. The last part of this talk focuses on the pipeline of personalisation and looks at probabilistic graphical models are part of a pipeline. 
Danielle Belgrave 
Sat 6:00 a.m.  6:30 a.m.

Invited Talk: Mike Hughes  The Case for Prediction Constrained Training
(Talk)
»
SlidesLive Video »
This talk considers adding supervision to wellknown generative latent variable models (LVMs), including both classic LVMs (e.g. mixture models, topic models) and more recent “deep” flavors (e.g. variational autoencoders). The standard way to add supervision to LVMs would be to treat the added label as another observed variable generated by the graphical model, and then maximize the joint likelihood of both labels and features. We find that across many models, this standard supervision leads to surprisingly negligible improvement in prediction quality over a more naive baseline that first fits an unsupervised model, and then makes predictions given that model’s learned lowdimensional representation. We can’t believe it is not better! Further, this problem is not properly solved by previous approaches that just upweight or “replicate” labels in the generative model (the problem is not just that we have more observed features than labels). Instead, we suggest the problem is related to model misspecification, and that the joint likelihood objective does not properly encode the desired performance goals at test time (we care about predicting labels from features, but not features from labels). This motivates a new training objective we call prediction constrained training, which can prioritize the labelfromfeature prediction task while still delivering reasonable generative models for the observed features. We highlight promising results of our proposed predictionconstrained framework including recent extensions to semisupervised VAEs and modelbased reinforcement learning. 
Mike Hughes 
Sat 6:30 a.m.  6:33 a.m.

Margot SelosseA bumpy journey: exploring deep Gaussian mixture models
(Spotlight Talk)
»
SlidesLive Video »
The deep Gaussian mixture model (DGMM) is a framework directly inspired by the finite mixture of factor analysers model (MFA) and the deep learning architecture composed of multiple layers. The MFA is a generative model that considers a data point as arising from a latent variable (termed the score) which is sampled from a standard multivariate Gaussian distribution and then transformed linearly. The linear transformation matrix (termed the loading matrix) is specific to a component in the finite mixture. The DGMM consists of stacking MFA layers, in the sense that the latent scores are no longer assumed to be drawn from a standard Gaussian, but rather are drawn from a mixture of factor analysers model. Thus the latent scores are at one point considered to be the input of an MFA and also to have latent scores themselves. The latent scores of the DGMM's last layer only are considered to be drawn from a standard multivariate Gaussian distribution. In recent years, the DGMM gained prominence in the literature: intuitively, this model should be able to capture distributions more precisely than a simple Gaussian mixture model. We show in this work that while the DGMM is an original and novel idea, in certain cases it is challenging to infer its parameters. In addition, we give some insights to the probable reasons of this difficulty. Experimental results are provided on github: https://github.com/ansubmissions/ICBINB, alongside an R package that implements the algorithm and a number of readytorun R scripts. 
Margot Selosse 
Sat 6:33 a.m.  6:36 a.m.

Diana CaiPower posteriors do not reliably learn the number of components in a finite mixture
(Spotlight Talk)
»
SlidesLive Video »
Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. Data science folk wisdom tells us that a finite mixture model (FMM) with a prior on the number of components will fail to recover the true, datagenerating number of components under model misspecification. But practitioners still widely use FMMs to learn the number of components, and statistical machine learning papers can be found recommending such an approach. Increasingly, though, data science papers suggest potential alternatives beyond vanilla FMMs, such as power posteriors, coarsening, and related methods. In this work we start by adding rigor to folk wisdom and proving that, under even the slightest model misspecification, the FMM componentcount posterior diverges: the posterior probability of any particular finite number of latent components converges to 0 in the limit of infinite data. We use the same theoretical techniques to show that power posteriors with fixed power face the same undesirable divergence, and we provide a proof for the case where the power converges to a nonzero constant. We illustrate the practical consequences of our theory on simulated and real data. We conjecture how our methods may be applied to lend insight into other componentcount robustification techniques. 
Diana Cai 
Sat 6:36 a.m.  6:39 a.m.

W Ronny HuangUnderstanding Generalization through Visualizations
(Spotlight Talk)
»
SlidesLive Video »
The power of neural networks lies in their ability to generalize to unseen data, yet the underlying reasons for this phenomenon remain elusive. Numerous rigorous attempts have been made to explain generalization, but available bounds are still quite loose, and analysis does not always lead to true understanding. The goal of this work is to make generalization more intuitive. Using visualization methods, we discuss the mystery of generalization, the geometry of loss landscapes, and how the curse (or, rather, the blessing) of dimensionality causes optimizers to settle into minima that generalize well. 
W. Ronny Huang 
Sat 6:39 a.m.  6:42 a.m.

Udari MadhushaniIt Doesn’t Get Better and Here’s Why: A Fundamental Drawback in Natural Extensions of UCB to Multiagent Bandits
(Spotlight Talk)
»
SlidesLive Video »
We identify a fundamental drawback of natural extensions of Upper Confidence Bound (UCB) algorithms to the multiagent bandit problem in which multiple agents facing the same exploreexploit problem can share information. We provide theoretical guarantees that when agents use a natural extension of the UCB sampling rule, sharing information about the optimal option degrades their performance. For K the number of agents and T the time horizon, we prove that when agents share information only about the optimal option they suffer an expected group cumulative regret of O(KlogT + KlogK), whereas when they do not share any information they only suffer a group regret of O(KlogtT). Further, while information sharing about all options yields much better performance than with no information sharing, we show that including information about the optimal option is not as good as sharing information only about suboptimal options. 
Udari Madhushani 
Sat 6:42 a.m.  6:45 a.m.

Erik JonesSelective Classification Can Magnify Disparities Across Groups
(Spotlight Talk)
»
SlidesLive Video »
Selective classification, in which models are allowed to abstain on uncertain predictions, is a natural approach to improving accuracy in settings where errors are costly but abstentions are manageable. In this paper, we find that while selective classification can improve average accuracies, it can simultaneously magnify existing accuracy disparities between various groups within a population, especially in the presence of spurious correlations. We observe this behavior consistently across five datasets from computer vision and NLP. Surprisingly, increasing the abstention rate can even decrease accuracies on some groups. To better understand when selective classification improves or worsens accuracy on a group, we study its margin distribution, which captures the model’s confidences over all predictions. For example, when the margin distribution is symmetric, we prove that whether selective classification monotonically improves or worsens accuracy is fully determined by the accuracy at full coverage (i.e., without any abstentions) and whether the distribution satisfies a property we term leftlogconcavity. Our analysis also shows that selective classification tends to magnify accuracy disparities that are present at full coverage. Fortunately, we find that it uniformly improves each group when applied to distributionallyrobust models that achieve similar fullcoverage accuracies across groups. Altogether, our results imply selective classification should be used with care and underscore the importance of models that perform equally well across groups at full coverage. 
Erik Jones 
Sat 6:45 a.m.  6:48 a.m.

Yannick RudolphGraph Conditional Variational Models: Too Complex for Multiagent Trajectories?
(Spotlight Talk)
»
SlidesLive Video »
Recent advances in modeling multiagent trajectories combine graph architectures such as graph neural networks (GNNs) with conditional variational models (CVMs) such as variational RNNs (VRNNs). Originally, CVMs have been proposed to facilitate learning with multimodal and structured data and thus seem to perfectly match the requirements of multimodal multiagent trajectories with their structured output spaces. Empirical results of VRNNs on trajectory data support this assumption. In this paper, we revisit experiments and proposed architectures with additional rigour, ablation runs and baselines. In contrast to common belief, we show that both, historic and current results with CVMs on trajectory data are misleading. Given a neural network with a graph architecture and/or structured output function, variational autoencoding does not contribute statistically significantly to empirical performance. Instead, we show that wellknown emission functions do contribute, while coming with less complexity, engineering and computation time. 
Yannick Rudolph 
Sat 6:50 a.m.  7:00 a.m.

Coffee Break (Gather.town available: https://bit.ly/3gxkLA7)
(Coffee Break)


Sat 7:00 a.m.  8:00 a.m.

Poster Session in gather.town: https://bit.ly/3gxkLA7
(Poster Session)
»
Link to access the gather town: https://bit.ly/3gxkLA7 

Sat 8:00 a.m.  8:15 a.m.

Charline Le LanPerfect density models cannot guarantee anomaly detection
(Contributed Talk)
»
SlidesLive Video »
Thanks to the tractability of their likelihood, some deep generative models show promise for seemingly straightforward but important applications like anomaly detection, uncertainty estimation, and active learning. However, the likelihood values empirically attributed to anomalies conflict with the expectations these proposed applications suggest. In this paper, we take a closer look at the behavior of distribution densities and show that these quantities carry less meaningful information than previously thought, beyond estimation issues or the curse of dimensionality. We conclude that the use of these likelihoods for outofdistribution detection relies on strong and implicit hypotheses and highlight the necessity of explicitly formulating these assumptions for reliable anomaly detection. 
Charline Le Lan 
Sat 8:15 a.m.  8:30 a.m.

Fan BaoVariational (Gradient) Estimate of the Score Function in Energybased Latent Variable Models
(Contributed Talk)
»
SlidesLive Video »
The learning and evaluation of energybased latent variable models (EBLVMs) without any structural assumptions are highly challenging, because the true posteriors and the partition functions in such models are generally intractable. This paper presents variational estimates of the score function and its gradient with respect to the model parameters in a general EBLVM, referred to as VaES and VaGES respectively. The variational posterior is trained to minimize a certain divergence to the true model posterior and the bias in both estimates can be bounded by the divergence theoretically. With a minimal model assumption, VaES and VaGES can be applied to the kernelized Stein discrepancy (KSD) and score matching (SM)based methods to learn EBLVMs. Besides, VaES can also be used to estimate the exact Fisher divergence between the data and general EBLVMs. 
Fan Bao 
Sat 8:30 a.m.  8:45 a.m.

Emilio JorgeInferential Induction: A Novel Framework for Bayesian Reinforcement Learning
(Contributed Talk)
»
SlidesLive Video »
Bayesian Reinforcement Learning (BRL) offers a decisiontheoretic solution to the reinforcement learning problem. While ''modelbased'' BRL algorithms have focused either on maintaining a posterior distribution on models, BRL ''modelfree'' methods try to estimate value function distributions but make strong implicit assumptions or approximations. We describe a novel Bayesian framework, \emph{inferential induction}, for correctly inferring value function distributions from data, which leads to a new family of BRL algorithms. We design an algorithm, Bayesian Backwards Induction (BBI), with this framework. We experimentally demonstrate that BBI is competitive with the state of the art. However, its advantage relative to existing BRL modelfree methods is not as great as we have expected, particularly when the additional computational burden is taken into account. 
Emilio Jorge 
Sat 9:00 a.m.  10:00 a.m.

Lunch Break (Gather.town available: https://bit.ly/3gxkLA7)
(Lunch)


Sat 10:00 a.m.  10:30 a.m.

Invited Talk: Andrew Gelman  It Doesn’t Work, But The Alternative Is Even Worse: Living With Approximate Computation
(Talk)
»
We can’t fit the models we want to fit because it takes too long to fit them on our computer. Also, we don’t know what models we want to fit until we try a few. I share some stories of struggles with datapartitioning and parameterpartitioning algorithms, what kinda worked and what didn’t. 
Andrew Gelman 
Sat 10:30 a.m.  11:00 a.m.

Invited Talk: Roger Grosse  Why Isn’t Everyone Using SecondOrder Optimization?
(Talk)
»
In the preAlexNet days of deep learning, secondorder optimization gave dramatic speedups and enabled training of deep architectures that seemed to be inaccessible to firstorder optimization. But today, despite algorithmic advances such as KFAC, nearly all modern neural net architectures are trained with variants of SGD and Adam. What’s holding us back from using secondorder optimization? I’ll discuss three challenges to applying secondorder optimization to modern neural nets: difficulty of implementation, implicit regularization effects of gradient descent, and the effect of gradient noise. All of these factors are significant, though not in the ways commonly believed. 
Roger Grosse 
Sat 11:00 a.m.  11:30 a.m.

Invited Talk: Weiwei Pan  What are Useful Uncertainties for Deep Learning and How Do We Get Them?
(Talk)
»
While deep learning has demonstrable success on many tasks, the point estimates provided by standard deep models can lead to overfitting and provide no uncertainty quantification on predictions. However, when models are applied to critical domains such as autonomous driving, precision health care, or criminal justice, reliable measurements of a model’s predictive uncertainty may be as crucial as correctness of its predictions. In this talk, we examine a number of deep (Bayesian) models that promise to capture complex forms for predictive uncertainties, we also examine metrics commonly used to such uncertainties. We aim to highlight strengths and limitations of these models as well as the metrics; we also discuss ideas to improve both in meaningful ways for downstream tasks. 
Weiwei Pan 
Sat 11:30 a.m.  11:33 a.m.

Vincent FortuinBayesian Neural Network Priors Revisited
(Spotlight Talk)
»
SlidesLive Video »
Isotropic Gaussian priors are the de facto standard for modern Bayesian neural network inference. However, there has been recent controversy over the question whether they might be to blame for the undesirable cold posterior effect. We study this question empirically and find that for densely connected networks, Gaussian priors are indeed less well suited than more heavytailed ones. Conversely, for convolutional architectures, Gaussian priors seem to perform well and thus cannot fully explain the cold posterior effect. These findings coincide with the empirical maximumlikelihood weight distributions discovered by standard gradientbased training. 
Vincent Fortuin 
Sat 11:33 a.m.  11:36 a.m.

Ziyu WangFurther Analysis of Outlier Detection with Deep Generative Models
(Spotlight Talk)
»
SlidesLive Video »
The recent, counterintuitive discovery that deep generative models (DGMs) can frequently assign a higher likelihood to outliers has implications for both outlier detection applications as well as our overall understanding of generative modeling. In this work, we present a possible explanation for this phenomenon, starting from the observation that a model's typical set and highdensity region may not coincide. From this vantage point we propose a novel outlier test, the empirical success of which suggests that the failure of existing likelihoodbased outlier tests does not necessarily imply that the corresponding generative model is uncalibrated. We also conduct additional experiments to help disentangle the impact of lowlevel texture versus highlevel semantics in differentiating outliers. In aggregate, these results suggest that modifications to the standard evaluation practices and benchmarks commonly applied in the literature are needed. 
Ziyu Wang 
Sat 11:36 a.m.  11:39 a.m.

Siwen YanThe Curious Case of Stacking Boosted Relational Dependency Networks
(Spotlight Talk)
»
SlidesLive Video »
Reducing bias while learning and inference is an important requirement to achieve generalizable and better performing models. The method of stacking took the first step towards creating such models by reducing inference bias but the question of combining stacking with a model that reduces learning bias is still largely unanswered. In statistical relational learning, ensemble models of relational trees such as boosted relational dependency networks (RDNBoost) are shown to reduce the learning bias. We combine RDNBoost and stacking methods with the aim of reducing both learning and inference bias subsequently resulting in better overall performance. However, our evaluation on three relational data sets shows no significant performance improvement over the baseline models. 
Siwen Yan 
Sat 11:39 a.m.  11:42 a.m.

Maurice Frank  Problems using deep generative models for probabilistic audio source separation
(Spotlight Talk)
»
SlidesLive Video »
Recent advancements in deep generative modeling make it possible to learn prior distributions from complex data that subsequently can be used for Bayesian inference. However, we find that distributions learned by deep generative models for audio signals do not exhibit the right properties that are necessary for tasks like audio source separation using a probabilistic approach. We observe that the learned prior distributions are either discriminative and extremely peaked or smooth and nondiscriminative. We quantify this behavior for two types of deep generative models on two audio datasets. 
Maurice Frank 
Sat 11:42 a.m.  11:45 a.m.

Ramiro CaminoOversampling Tabular Data with Deep Generative Models: Is it worth the effort?
(Spotlight Talk)
»
SlidesLive Video »
In practice, machine learning experts are often confronted with imbalanced data. Without accounting for the imbalance, common classifiers perform poorly and standard evaluation metrics mislead the practitioners on the model's performance. A common method to treat imbalanced datasets is under and oversampling. In this process, samples are either removed from the majority class or synthetic samples are added to the minority class. In this paper, we follow up on recent developments in deep learning. We take proposals of deep generative models, and study the ability of these approaches to provide realistic samples that improve performance on imbalanced classification tasks via oversampling. Across 160K+ experiments, we show that the improvements in terms of performance metric, while shown to be significant when ranking the methods like in the literature, often are minor in absolute terms, especially compared to the required effort. Furthermore, we notice that a large part of the improvement is due to undersampling, not oversampling. 
Ramiro Camino 
Sat 11:45 a.m.  11:48 a.m.

Ângelo Gregório LovattoDecisionAware Model Learning for ActorCritic Methods: When Theory Does Not Meet Practice
(Spotlight Talk)
»
SlidesLive Video »
ActorCritic methods are a prominent class of modern reinforcement learning algorithms based on the classic Policy Iteration procedure. Despite many successful cases, ActorCritic methods tend to require a gigantic number of experiences and can be very unstable. Recent approaches have advocated learning and using a world model to improve sample efficiency and reduce reliance on the value function estimate. However, learning an accurate dynamics model of the world remains challenging, often requiring computationally costly and datahungry models. More recent work has shown that learning an everywhere accurate model is unnecessary and often detrimental to the overall task; instead, the agent should improve the world model on taskcritical regions. For example, in Iterative ValueAware Model Learning, the authors extend modelbased value iteration by incorporating the value function (estimate) into the model loss function, showing the novel model objective reflects improved performance in the end task. Therefore, it seems natural to expect that modelbased ActorCritic methods can benefit equally from learning valueaware models, improving overall task performance, or reducing the need for large, expensive models. However, we show empirically that combining ActorCritic and valueaware model learning can be quite difficult and that naive approaches such as maximum likelihood estimation often achieve superior performance with less computational cost. Our results suggest that, despite theoretical guarantees, learning a valueaware model in continuous domains does not ensure better performance on the overall task. 
Ângelo Lovatto 
Sat 11:50 a.m.  12:00 p.m.

Coffee Break (Gather.town available: https://bit.ly/3gxkLA7)
(Break)


Sat 12:00 p.m.  12:15 p.m.

Tin D. NguyenIndependent versus truncated finite approximations for Bayesian nonparametric inference
(Contributed Talk)
»
SlidesLive Video »
Bayesian nonparametric models based on completely random measures (CRMs) offers flexibility when the number of clusters or latent components in a data set is unknown. However, managing the infinite dimensionality of CRMs often leads to slow computation during inference. Practical inference typically relies on either integrating out the infinitedimensional parameter or using a finite approximation: a truncated finite approximation (TFA) or an independent finite approximation (IFA). The atom weights of TFAs are constructed sequentially, while the atoms of IFAs are independent, which facilitates more convenient inference schemes. While the approximation error of TFA has been systematically addressed, there has not yet been a similar study of IFA. We quantify the approximation error between IFAs and two common target nonparametric priors (betaBernoulli process and Dirichlet process mixture model) and prove that, in the worstcase, TFAs provide more componentefficient approximations than IFAs. However, in experiments on image denoising and topic modeling tasks with real data, we find that the error of Bayesian approximation methods overwhelms any finite approximation error, and IFAs perform very similarly to TFAs. 
Stan Nguyen 
Sat 12:15 p.m.  12:30 p.m.

Ricky T. Q. ChenSelfTuning Stochastic Optimization with CurvatureAware Gradient Filtering
(Contributed Talk)
»
SlidesLive Video »
Standard firstorder stochastic optimization algorithms base their updates solely on the average minibatch gradient, and it has been shown that tracking additional quantities such as the curvature can help desensitize common hyperparameters. Based on this intuition, we explore the use of exact persample Hessianvector products and gradients to construct optimizers that are selftuning and hyperparameterfree. Based on a dynamics model of the gradient, we derive a process which leads to a curvaturecorrected, noiseadaptive online gradient estimate. The smoothness of our updates makes it more amenable to simple step size selection schemes, which we also base off of our estimates quantities. We prove that our modelbased procedure converges in the noisy quadratic setting. Though we do not see similar gains in deep learning tasks, we can match the performance of welltuned optimizers and ultimately, this is an interesting step for constructing selftuning optimizers. 
Ricky T. Q. Chen 
Sat 12:30 p.m.  12:45 p.m.

Elliott GordonRodriguezUses and Abuses of the CrossEntropy Loss: Case Studies in Modern Deep Learning
(Contributed Talk)
»
SlidesLive Video »
Modern deep learning is primarily an experimental science, in which empirical advances occasionally come at the expense of probabilistic rigor. Here we focus on one such example; namely the use of the categorical crossentropy loss to model data that is not strictly categorical, but rather takes values on the simplex. This practice is standard in neural network architectures with label smoothing and actormimic reinforcement learning, amongst others. Drawing on the recently discovered {continuouscategorical} distribution, we propose probabilisticallyinspired alternatives to these models, providing an approach that is a more principled and theoretically appealing. Through careful experimentation, including an ablation study, we identify the potential for outperformance in these models, thereby highlighting the importance of a proper probabilistic treatment, as well as illustrating some of the failure modes thereof. 
Elliott GordonRodriguez 
Sat 12:45 p.m.  1:45 p.m.

Poster Session (in gather.town): https://bit.ly/3gxkLA7
(Poster Session (in gather.town))
»
Link to access the Gather.town: https://neurips.gather.town/app/5163xhrHdSWrUZsG/ICBINB 

Sat 1:15 p.m.  1:45 p.m.

Breakout Discussions (in gather.town): https://bit.ly/3gxkLA7
»
Link to access the Gather.town: https://neurips.gather.town/app/5163xhrHdSWrUZsG/ICBINB 

Sat 1:45 p.m.  2:45 p.m.

Panel & Closing
(Panel)
»
A panel discussion moderated by Hanna Wallach (MSR New York). Panelists:  Tamera Broderick (MIT)  Laurent Dinh (Google Brain)  Neil Lawrence (Cambridge)  Kristian Lum (Human Rights Data Analysis Group)  Sinead Williamson (UT Austin) 
Tamara Broderick, Laurent Dinh, Neil Lawrence, Kristian Lum, Hanna Wallach, Sinead Williamson 
Author Information
Jessica Forde (Brown University)
Francisco Ruiz (DeepMind)
Melanie Fernandez Pradier (Microsoft Research)
Aaron Schein (Columbia University)
Finale DoshiVelez (Harvard)
Isabel Valera (Saarland University & MPI for Intelligent Systems)
David Blei (Columbia University)
David Blei is a Professor of Statistics and Computer Science at Columbia University, and a member of the Columbia Data Science Institute. His research is in statistical machine learning, involving probabilistic topic models, Bayesian nonparametric methods, and approximate posterior inference algorithms for massive data. He works on a variety of applications, including text, images, music, social networks, user behavior, and scientific data. David has received several awards for his research, including a Sloan Fellowship (2010), Office of Naval Research Young Investigator Award (2011), Presidential Early Career Award for Scientists and Engineers (2011), Blavatnik Faculty Award (2013), and ACMInfosys Foundation Award (2013). He is a fellow of the ACM.
Hanna Wallach (Microsoft)
More from the Same Authors

2020 Poster: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale DoshiVelez 
2020 Spotlight: Incorporating Interpretable Output Constraints in Bayesian Neural Networks »
Wanqian Yang · Lars Lorch · Moritz Graule · Himabindu Lakkaraju · Finale DoshiVelez 
2020 Session: Orals & Spotlights Track 35: Neuroscience/Probabilistic »
Leila Wehbe · Francisco Ruiz 
2020 Poster: Markovian Score Climbing: Variational Inference with KL(pq) »
Christian Naesseth · Fredrik Lindsten · David Blei 
2020 Poster: VarGrad: A LowVariance Gradient Estimator for Variational Inference »
Lorenz Richter · Ayman Boustati · Nikolas Nüsken · Francisco Ruiz · Omer Deniz Akyildiz 
2020 Poster: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach »
AmirHossein Karimi · Julius von Kügelgen · Bernhard Schölkopf · Isabel Valera 
2020 Spotlight: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach »
AmirHossein Karimi · Julius von Kügelgen · Bernhard Schölkopf · Isabel Valera 
2020 Poster: Modelbased Reinforcement Learning for SemiMarkov Decision Processes with Neural ODEs »
Jianzhun Du · Joseph Futoma · Finale DoshiVelez 
2019 Workshop: Learning with Temporal Point Processes »
Manuel Rodriguez · Le Song · Isabel Valera · Yan Liu · Abir De · Hongyuan Zha 
2019 Workshop: Retrospectives: A Venue for SelfReflection in ML Research »
Ryan Lowe · Yoshua Bengio · Joelle Pineau · Michela Paganini · Jessica Forde · Shagun Sodhani · Abhishek Gupta · Joel Lehman · Peter Henderson · Kanika Madan · Koustuv Sinha · Xavier Bouthillier 
2019 Workshop: Workshop on HumanCentric Machine Learning »
Plamen P Angelov · Nuria Oliver · Adrian Weller · Manuel Rodriguez · Isabel Valera · Silvia Chiappa · Hoda Heidari · Niki Kilbertus 
2019 Poster: PoissonRandomized Gamma Dynamical Systems »
Aaron Schein · Scott Linderman · Mingyuan Zhou · David Blei · Hanna Wallach 
2019 Poster: Variational Bayes under Model Misspecification »
Yixin Wang · David Blei 
2019 Poster: Using Embeddings to Correct for Unobserved Confounding in Networks »
Victor Veitch · Yixin Wang · David Blei 
2019 Poster: Adapting Neural Networks for the Estimation of Treatment Effects »
Claudia Shi · David Blei · Victor Veitch 
2018 Poster: Boosting Black Box Variational Inference »
Francesco Locatello · Gideon Dresdner · Rajiv Khanna · Isabel Valera · Gunnar Raetsch 
2018 Poster: HumanintheLoop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale DoshiVelez 
2018 Spotlight: HumanintheLoop Interpretability Prior »
Isaac Lage · Andrew Ross · Samuel J Gershman · Been Kim · Finale DoshiVelez 
2018 Spotlight: Boosting Black Box Variational Inference »
Francesco Locatello · Gideon Dresdner · Rajiv Khanna · Isabel Valera · Gunnar Raetsch 
2018 Poster: Representation Balancing MDPs for Offpolicy Policy Evaluation »
Yao Liu · Omer Gottesman · Aniruddh Raghu · Matthieu Komorowski · Aldo Faisal · Finale DoshiVelez · Emma Brunskill 
2018 Poster: Enhancing the Accuracy and Fairness of Human Decision Making »
Isabel Valera · Adish Singla · Manuel Gomez Rodriguez 
2018 Demonstration: Reproducing Machine Learning Research on Binder »
Jessica Forde · Tim Head · Chris Holdgraf · M Pacer · FélixAntoine Fortin · Fernando Perez 
2017 Workshop: Advances in Approximate Bayesian Inference »
Francisco Ruiz · Stephan Mandt · Cheng Zhang · James McInerney · James McInerney · Dustin Tran · Dustin Tran · David Blei · Max Welling · Tamara Broderick · Michalis Titsias 
2017 Poster: Hierarchical Implicit Models and LikelihoodFree Variational Inference »
Dustin Tran · Rajesh Ranganath · David Blei 
2017 Poster: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale DoshiVelez · George Konidaris 
2017 Oral: Robust and Efficient Transfer Learning with Hidden Parameter Markov Decision Processes »
Taylor Killian · Samuel Daulton · Finale DoshiVelez · George Konidaris 
2017 Poster: Structured Embedding Models for Grouped Data »
Maja Rudolph · Francisco Ruiz · Susan Athey · David Blei 
2017 Poster: Variational Inference via $\chi$ Upper Bound Minimization »
Adji Bousso Dieng · Dustin Tran · Rajesh Ranganath · John Paisley · David Blei 
2017 Poster: Context Selection for Embedding Models »
Liping Liu · Francisco Ruiz · Susan Athey · David Blei 
2016 Workshop: Advances in Approximate Bayesian Inference »
Tamara Broderick · Stephan Mandt · James McInerney · Dustin Tran · David Blei · Kevin Murphy · Andrew Gelman · Michael I Jordan 
2016 Poster: Operator Variational Inference »
Rajesh Ranganath · Dustin Tran · Jaan Altosaar · David Blei 
2016 Poster: PoissonGamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou 
2016 Oral: PoissonGamma dynamical systems »
Aaron Schein · Hanna Wallach · Mingyuan Zhou 
2016 Poster: The Generalized Reparameterization Gradient »
Francisco Ruiz · Michalis Titsias · David Blei 
2016 Poster: Exponential Family Embeddings »
Maja Rudolph · Francisco Ruiz · Stephan Mandt · David Blei 
2016 Poster: Flexible Models for Microclustering with Application to Entity Resolution »
Brenda Betancourt · Giacomo Zanella · Jeffrey Miller · Hanna Wallach · Abbas Zaidi · Beka Steorts 
2016 Tutorial: Variational Inference: Foundations and Modern Methods »
David Blei · Shakir Mohamed · Rajesh Ranganath 
2015 Workshop: Bayesian Nonparametrics: The Next Generation »
Tamara Broderick · Nick Foti · Aaron Schein · Alex Tank · Hanna Wallach · Sinead Williamson 
2015 Workshop: Machine Learning From and For Adaptive User Technologies: From Active Learning & Experimentation to Optimization & Personalization »
Joseph Jay Williams · Yasin Abbasi Yadkori · Finale DoshiVelez 
2015 Workshop: Advances in Approximate Bayesian Inference »
Dustin Tran · Tamara Broderick · Stephan Mandt · James McInerney · Shakir Mohamed · Alp Kucukelbir · Matthew D. Hoffman · Neil Lawrence · David Blei 
2015 Poster: Mind the Gap: A Generative Approach to Interpretable Feature Selection and Extraction »
Been Kim · Julie A Shah · Finale DoshiVelez 
2015 Poster: The Population Posterior and Bayesian Modeling on Streams »
James McInerney · Rajesh Ranganath · David Blei 
2015 Poster: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei 
2015 Spotlight: Automatic Variational Inference in Stan »
Alp Kucukelbir · Rajesh Ranganath · Andrew Gelman · David Blei 
2015 Poster: Copula variational inference »
Dustin Tran · David Blei · Edo M Airoldi 
2013 Workshop: Topic Models: Computation, Application, and Evaluation »
David Mimno · Amr Ahmed · Jordan BoydGraber · Ankur Moitra · Hanna Wallach · Alexander Smola · David Blei · Anima Anandkumar 
2013 Demonstration: DiBOSS™: Digital Building Operating System Solution »
Jessica Forde · Vivek Rathod · Hooshmand Shookri · Vaibhav Bandari · Ashwath Rajan · John Min · Ariel Fan · Leon Wu · Ashish Gagneja · Doug Riecken · David Solomon · Lauren Hannah · Albert Boulanger · Roger Anderson 
2012 Poster: TopicPartitioned Multinetwork Embeddings »
Peter Krafft · Juston S Moore · Hanna Wallach · Bruce Desmarais 
2011 Workshop: 2nd Workshop on Computational Social Science and the Wisdom of Crowds »
Winter Mason · Jennifer Wortman Vaughan · Hanna Wallach 
2010 Workshop: Computational Social Science and the Wisdom of Crowds »
Jennifer Wortman Vaughan · Hanna Wallach 
2009 Workshop: Applications for Topic Models: Text and Beyond »
David Blei · Jordan BoydGraber · Jonathan Chang · Katherine Heller · Hanna Wallach 
2009 Poster: Rethinking LDA: Why Priors Matter »
Hanna Wallach · David Mimno · Andrew McCallum 
2009 Spotlight: Rethinking LDA: Why Priors Matter »
Hanna Wallach · David Mimno · Andrew McCallum