Timezone: »

Workshop
Causal Inference & Machine Learning: Why now?
Elias Bareinboim · Bernhard Schölkopf · Terrence Sejnowski · Yoshua Bengio · Judea Pearl

Mon Dec 13 07:00 AM -- 03:30 PM (PST) @

Machine Learning has been extremely successful throughout many critical areas, including computer vision, natural language processing, and game-playing. Still, a growing segment of the machine learning community recognizes that there are still fundamental pieces missing from the AI puzzle, among them causal inference.

This recognition comes from the observation that even though causality is a central component found throughout the sciences, engineering, and many other aspects of human cognition, explicit reference to causal relationships is largely missing in current learning systems. This entails a new goal of integrating causal inference and machine learning capabilities into the next generation of intelligent systems, thus paving the way towards higher levels of intelligence and human-centric AI. The synergy goes in both directions; causal inference benefitting from machine learning and the other way around. Current machine learning systems lack the ability to leverage the invariances imprinted by the underlying causal mechanisms towards reasoning about generalizability, explainability, interpretability, and robustness. Current causal inference methods, on the other hand, lack the ability to scale up to high-dimensional settings, where current machine learning systems excel.

The goal of this workshop is to bring together researchers from both camps to initiate principled discussions about the integration of causal reasoning and machine learning perspectives to help tackle the challenging AI tasks of the coming decades. We welcome researchers from all relevant disciplines, including but not limited to computer science, cognitive science, robotics, mathematics, statistics, physics, and philosophy.

 Mon 7:00 a.m. - 7:10 a.m. Intro 🔗 Mon 7:10 a.m. - 7:30 a.m. Uri Shalit - Calibration, out-of-distribution generalization and a path towards causal representations (Invited Talk) Uri Shalit 🔗 Mon 7:30 a.m. - 7:50 a.m. Julius von Kügelgen - Independent mechanism analysis, a new concept? (Invited Talk)    Independent component analysis provides a principled framework for unsupervised representation learning, with solid theory on the identifiability of the latent code that generated the data, given only observations of mixtures thereof. Unfortunately, when the mixing is nonlinear, the model is provably nonidentifiable, since statistical independence alone does not sufficiently constrain the problem. Identifiability can be recovered in settings where additional, typically observed variables are included in the generative process. We investigate an alternative path and consider instead including assumptions reflecting the principle of independent causal mechanisms exploited in the field of causality. Specifically, our approach is motivated by thinking of each source as independently influencing the mixing process. This gives rise to a framework which we term independent mechanism analysis. We provide theoretical and empirical evidence that our approach circumvents a number of nonidentifiability issues arising in nonlinear blind source separation. Reference: https://arxiv.org/abs/2106.05200 (accepted at: NeurIPS 2021) Julius von Kügelgen 🔗 Mon 7:50 a.m. - 8:10 a.m. David Blei - On the Assumptions of Synthetic Control Methods (Invited Talk) David Blei 🔗 Mon 8:10 a.m. - 8:25 a.m. Session 1: Q&A (Q&A) 🔗 Mon 8:30 a.m. - 8:50 a.m. Ricardo Silva - The Road to Causal Programming (Invited Talk) Ricardo Silva 🔗 Mon 8:50 a.m. - 9:10 a.m. Aapo Hyvarinen - Causal discovery by generative modelling (Invited Talk)    There is a deep connection between causal discovery and generative models, such as factor analysis, independent component analysis, and various unsupervised deep learning models. Two key concepts that emerge are identifiability and nonstationarity. In this talk, I will review this research, providing some historical perspectives as well as open questions for future research. Aapo Hyvarinen 🔗 Mon 9:10 a.m. - 9:35 a.m. Tobias Gerstenberg - Going beyond the here and now: Counterfactual simulation in human cognition (Invited Talk)    As humans, we spend much of our time going beyond the here and now. We dwell on the past, long for the future, and ponder how things could have turned out differently. In this talk, I will argue that people's knowledge of the world is organized around causally structured mental models, and that much of human thought can be understood as cognitive operations over these mental models. Specifically, I will highlight the pervasiveness of counterfactual thinking in human cognition. Counterfactuals are critical for how people make causal judgments, how they explain what happened, and how they hold others responsible for their actions. Tobias Gerstenberg 🔗 Mon 9:35 a.m. - 9:45 a.m. Session 2: Q&A (Q&A) 🔗 Mon 9:45 a.m. - 10:45 a.m. Poster Session  link » 🔗 Mon 10:45 a.m. - 11:05 a.m. Thomas Icard - A (topo)logical perspective on causal inference (Invited Talk) Thomas Icard 🔗 Mon 11:05 a.m. - 11:25 a.m. Caroline Uhler: TBA (Invited Talk) Caroline Uhler 🔗 Mon 11:25 a.m. - 11:45 a.m. Rosemary Ke - From "What" to "Why": towards causal learning (Invited Talk) Nan Rosemary Ke 🔗 Mon 11:45 a.m. - 12:00 p.m. Session 3: Q&A (Q&A) 🔗 Mon 12:00 p.m. - 12:45 p.m. Judea Pearl - The logic of Causal Inference (Keynote Speaker) 🔗 Mon 12:45 p.m. - 1:00 p.m. Discussion Panel 🔗 Mon 1:00 p.m. - 1:15 p.m. Zaffalon, Antonucci, Cabañas - Causal Expectation-Maximisation (Contributed Talk)    Structural causal models are the basic modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which are at the top rung of the ladder of causation. But they often contain latent variables that limit their application to special settings. This appears to be a consequence of the fact, proven in this paper, that causal inference is NP-hard even in models characterised by polytree-shaped graphs. To deal with such a hardness, we introduce the causal EM algorithm. Its primary aim is to reconstruct the uncertainty about the latent variables from data about categorical manifest variables. Counterfactual inference is then addressed via standard algorithms for Bayesian networks. The result is a general method to approximately compute counterfactuals, be they identifiable or not (in which case we deliver bounds). We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. These results lead us finally to argue that there appears to be an unnoticed limitation to the trending idea that counterfactual bounds can often be computed without knowledge of the structural equations. Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗 Mon 1:15 p.m. - 1:30 p.m. Dominguez Olmedo, Karimi, Schölkopf - On the Adversarial Robustness of Causal Algorithmic Recourse (Contributed Talk)    Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decision-making systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances of the individual seeking recourse. In this work, we formulate the adversarially robust recourse problem and show that methods that offer minimally costly recourse fail to be robust. We restrict ourselves to linear classifiers, and show that the adversarially robust recourse problem reduces to the standard recourse problem for some modified classifier with a shifted decision boundary. Finally, we derive bounds on the extra cost incurred by individuals seeking robust recourse, and discuss how to regulate this cost between the individual and the decision-maker. Ricardo Dominguez-Olmedo · Amir Karimi · Bernhard Schölkopf 🔗 Mon 1:30 p.m. - 1:45 p.m. Javidian, Pandey, Jamshidi - Scalable Causal Domain Adaptation (Contributed Talk)    One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in low-dimensional and high-dimensional settings. Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 Mon 1:45 p.m. - 2:00 p.m. Cundy, Grover, Ermon - BCD Nets: Scalable Variational Approaches for Bayesian Causal Discovery (Contributed Talk)    A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. Chris Cundy · Aditya Grover · Stefano Ermon 🔗 Mon 2:00 p.m. - 2:20 p.m. Alison Gopnik - Casual Learning in Children and Computational Models (Invited Talk)    Very young children routinely solve causal problems that are still very challenging for machine learning systems. I will outline several exciting recent lines of work looking at young children’s causal reasoning and learning and comparing it to learning in various computational models. This includes work on the selection of relevant test variables, learning abstract and analogical relationships, and, most importantly, techniques for active learning and causal exploration. Alison Gopnik 🔗 Mon 2:20 p.m. - 2:40 p.m. Adèle Ribeiro - Effect Identification in Cluster Causal Diagrams (Invited Talk)    A pervasive task found throughout the empirical sciences is to determine the effect of interventions from observational data. It is well-understood that assumptions are necessary to perform such causal inferences, an idea popularized through Cartwright’s motto: "no causes-in, no causes-out." One way of articulating these assumptions is through the use of causal diagrams, which are a special type of graphical model with causal semantics [Pearl, 2000]. The graphical approach has been applied successfully in many settings, but there are still challenges to its use, particularly in complex, high-dimensional domains. In this talk, I will introduce cluster causal diagrams (C-DAGs), a novel causal graphical model that allows for the partial specification of the relationships among variables. C-DAGs provide a simple yet effective way to partially abstract a grouping (cluster) of variables among which causal relationships are not fully understood while preserving consistency with the underlying causal system and the validity of causal identification tools. Reference: https://causalai.net/r77.pdf Adèle Ribeiro 🔗 Mon 2:40 p.m. - 3:00 p.m. Victor Chernozhukov - Omitted Confounder Bias Bounds for Machine Learned Causal Models (Invited Talk) Victor Chernozhukov 🔗 Mon 3:00 p.m. - 3:15 p.m. Session 4: Q&A (Q&A) 🔗 Mon 3:15 p.m. - 3:30 p.m. Closing Remarks 🔗 - Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation (Poster) We aim to explain a black-box classifier with the form: data X is classified as class Y because X \textit{has} A, B and \textit{does not have} C' in which A, B, and C are high-level concepts. The challenge is that we have to discover in an unsupervised manner a set of concepts, i.e., A, B and C, that is useful for the explaining the classifier. We first introduce a structural generative model that is suitable to express and discover such concepts. We then propose a learning process that simultaneously learns the data distribution and encourages certain concepts to have a large causal influence on the classifier output. Our method also allows easy integration of user's prior knowledge to induce high interpretability of concepts. Using multiple datasets, we demonstrate that our method can discover useful binary concepts for explanation. Thien Tran · Kazuto Fukuchi · Youhei Akimoto · Jun Sakuma 🔗 - Encoding Causal Macrovariables (Poster) In many scientific disciplines, coarse-grained causal models are used to explain and predict the dynamics of more fine-grained systems. Naturally, such models require appropriate macrovariables. Automated procedures to detect suitable variables would be useful to leverage increasingly available high-dimensional observational datasets. This work introduces a novel algorithmic approach that is inspired by a new characterisation of causal macrovariables as information bottlenecks between microstates. Its general form can be adapted to address individual needs of different scientific goals. After a further transformation step, the causal relationships between learned variables can be investigated through additive noise models. Experiments on both simulated data and on a real climate dataset are reported. In a synthetic dataset, the algorithm robustly detects the ground-truth variables and correctly infers the causal relationships between them. In a real climate dataset, the algorithm robustly detects two variables that correspond to the two known variations of the El Nino phenomenon. Benedikt Höltgen 🔗 - Amortized Causal Discovery: Learning to Infer Causal Graphs from Time-Series Data (Poster) Standard causal discovery methods must ﬁt a new model whenever they encounter samples from a new underlying causal graph. However, these samples often share relevant information – for instance, the dynamics describing the effects of causal relations – which is lost when following this approach. We propose Amortized Causal Discovery, a novel framework that leverages such shared dynamics to learn to infer causal relations from time-series data. This enables us to train a single, amortized model that infers causal relations across samples with different underlying causal graphs, and thus makes use of the information that is shared. We demonstrate experimentally that this approach, implemented as a variational model, leads to signiﬁcant improvements in causal discovery performance, and show how it can be extended to perform well under hidden confounding. Sindy Löwe · David Madras · Richard Zemel · Max Welling 🔗 - Disentangling Causal Effects from Sets of Interventions in the Presence of Unobserved Confounders (Poster) The ability to answer causal questions is crucial in many domains, as causal inference allows one to understand the impact of interventions. In many applications, only a single intervention is possible at a given time. However, in certain important areas, multiple interventions are concurrently applied. Disentangling the effects of single interventions from jointly applied interventions is a challenging task---especially as simultaneously applied interventions can interact. This problem is made harder still by unobserved confounders, which influence both interventions and outcome. We address this challenge by aiming to learn the effect of a single-intervention from both observational data and sets of interventions. We prove that this is not generally possible, but provide identification proofs demonstrating that it can be achieved in certain classes of additive noise models---even in the presence of unobserved confounders. Importantly, we show how to incorporate observed covariates and learn heterogeneous treatment effects conditioned on them for single-interventions. Olivier Jeunen · Ciaran Gilligan-Lee · Rishabh Mehrotra · Mounia Lalmas 🔗 - Typing assumptions improve identification in causal discovery (Poster) Causal discovery from observational data is a challenging task to which an exact solution cannot always be identified. Under assumptions about the data-generative process, the causal graph can often be identified up to an equivalence class. Proposing new realistic assumptions to circumscribe such equivalence classes is an active field of research. In this work, we propose a new set of assumptions that constrain possible causal relationships based on the nature of the variables. We thus introduce typed directed acyclic graphs, in which variable types are used to determine the validity of causal relationships. We demonstrate, both theoretically and empirically, that the proposed assumptions can result in significant gains in the identification of the causal graph. Philippe Brouillard · Perouz Taslakian · Alexandre Lacoste · Sébastien Lachapelle · Alexandre Drouin 🔗 - Prequential MDL for Causal Structure Learning with Neural Networks (Poster) Learning the structure of Bayesian networks and causal relationships from observations is a common goal in several areas of science and technology. We show that the prequential minimum description length principle (MDL) can be used to derive a practical scoring function for Bayesian networks when flexible and overparametrized neural networks are used to model the conditional probability distributions between observed variables. MDL represents an embodiment of Occam's Razor and we obtain plausible and parsimonious graph structures without relying on sparsity inducing priors or other regularizers which must be tuned. Empirically we demonstrate competitive results on synthetic and real-world data. The score often recovers the correct structure even in the presence of strongly nonlinear relationships between variables; a scenario were prior approaches struggle and usually fail. Furthermore we discuss how the the prequential score relates to recent work that infers causal structure from the speed of adaptation when the observations come from a source undergoing distributional shift. Jorg Bornschein · Silvia Chiappa · Alan Malek · Nan Rosemary Ke 🔗 - MANM-CS: Data Generation for Benchmarking Causal Structure Learning from Mixed Discrete-Continuous and Nonlinear Data (Poster) In recent years, the growing interest in methods of causal structure learning (CSL) has been confronted with a lack of access to a well-defined ground truth within real-world scenarios to evaluate these methods. Existing synthetic benchmarks are limited in their scope. They are either restricted to a “static” low-dimensional data set or do not allow examining mixed discrete-continuous or nonlinear data. This work introduces the mixed additive noise model that provides a ground truth framework for generating observational data following various distribution models. Moreover, we present our reference implementation MANM-CS that provides easy access and demonstrate how our framework can support researchers and practitioners. Further, we propose future research directions and possible extensions. Johannes Huegle · Christopher Hagedorn · Jonas Umland · Rainer Schlosser 🔗 - DiBS: Differentiable Bayesian Structure Learning (Poster) Bayesian structure learning allows inferring Bayesian network structure from data while reasoning about the epistemic uncertainty---a key element towards enabling active causal discovery and designing interventions in real world systems. In this work, we propose a general, fully differentiable framework for Bayesian structure learning (DiBS) that operates in the continuous space of a latent probabilistic graph representation. Contrary to existing work, DiBS is agnostic to the form of the local conditional distributions and allows for joint posterior inference of both the graph structure and the conditional distribution parameters. This makes DiBS directly applicable to posterior inference of nonstandard Bayesian network models, e.g., with nonlinear dependencies encoded by neural networks. Building on recent advances in variational inference, we use DiBS to devise an efficient general purpose method for approximating posteriors over structural models. In evaluations on simulated and real-world data, our method significantly outperforms related approaches to joint posterior inference. Lars Lorch · Jonas Rothfuss · Bernhard Schölkopf · Andreas Krause 🔗 - Learning Neural Causal Models with Active Interventions (Poster) Discovering causal structures from data is a challenging inference problem of fundamental importance in all areas of science. The appealing scaling properties of neural networks have recently led to a surge of interest in differentiable neural network-based methods for learning causal structures from data. So far differentiable causal discovery has focused on static datasets of observational or interventional origin. In this work, we introduce an active intervention-targeting mechanism which enables a quick identification of the underlying causal structure of the data-generating process. Our method significantly reduces the required number of interactions compared with random intervention targeting and is applicable for both discrete and continuous optimization formulations of learning the underlying directed acyclic graph (DAG) from data. We examine the proposed method across a wide range of settings and demonstrate superior performance on multiple benchmarks from simulated to real-world data. Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal ALIAS PARTH GOYAL · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke 🔗 - Identification of Latent Graphs: A Quantum Entropic Approach (Poster) Quantum causality is an emerging field of study that has the potential to greatly advance our understanding of quantum systems. In this paper, we put forth a new theoretical framework for merging quantum information science and causal inference by exploiting entropic principles. For this purpose, we leverage the tradeoff between the entropy of hidden cause and conditional mutual information of observed variables to develop a scalable algorithmic approach for inferring causality in the presence of latent confounders (common causes) in quantum systems. As an application, we consider a system of three entangled qubits and transmit the second and third qubits over separate noisy quantum channels. In this model, we validate that the first qubit is a latent confounder and the common cause of the second and third qubits. In contrast, when two entangled qubits are prepared, and one of them is sent over a noisy channel, there is no common confounder. We also demonstrate that the proposed approach outperforms the results of classical causal inference for the Tubingen database when the variables are classical by exploiting quantum dependence between variables through density matrices rather than joint probability distributions. Thus, the proposed approach unifies classical and quantum causal inference in a principled way. Mohammad Ali Javidian · Vaneet Aggarwal · Zubin Jacob 🔗 - Reliable causal discovery based on mutual information supremum principle for finite datasets (Poster) The recent method, MIIC (Multivariate Information-based Inductive Causation), combining constraint-based and information-theoretic frameworks, has been shown to significantly improve causal discovery from purely observational data. Yet, a substantial loss in precision has remained between skeleton and oriented graph predictions for small datasets. Here, we propose and implement a simple modification, named conservative MIIC, based on a general mutual information supremum principle regularized for finite datasets. In practice, conservative MIIC rectifies the negative values of regularized (conditional) mutual information used by MIIC to identify (conditional) independence between discrete, continuous or mixed-type variables. This modification is shown to greatly enhance the reliability of predicted orientations, for all sample sizes, with only a small sensitivity loss compared to MIIC original orientation rules. Conservative MIIC is especially interesting to improve the reliability of causal discovery for real-life observational data applications. Vincent Cabeli · Honghao Li · Marcel da Câmara Ribeiro Dantas · Herve Isambert 🔗 - Scalable Causal Domain Adaptation (Poster) One of the most important problems in transfer learning is the task of domain adaptation, where the goal is to apply an algorithm trained in one or more source domains to a different (but related) target domain. This paper deals with domain adaptation in the presence of covariate shift while there exist invariances across domains. One of the main limitations of existing causal inference methods for solving this problem is scalability. To overcome this difficulty, we propose SCTL, an algorithm that avoids an exhaustive search and identifies invariant causal features across source and target domains based on Markov blanket discovery. SCTL does not require having prior knowledge of the causal structure, the type of interventions, or the intervention targets. There is an intrinsic locality associated with SCTL that makes SCTL practically scalable and robust because local causal discovery increases the power of computational independence tests and makes the task of domain adaptation computationally tractable. We show the scalability and robustness of SCTL for domain adaptation using synthetic and real data sets in low-dimensional and high-dimensional settings. Mohammad Ali Javidian · Om Pandey · Pooyan Jamshidi 🔗 - Learning preventative and generative causal structures from point events in continuous time (Poster) Many previous accounts of causal structure induction have focused on atemporal contingency data while fewer have described learning on the basis of observations of events unfolding over time. How do people use temporal information to infer causal structures? Here we develop a computational-level framework and propose several algorithmic-level approximations to explain how people impute causal structures from continuous-time event sequences. We compare both normative and process accounts to participant behavior across two experiments. We consider structures combining both generative and preventative causal relationships in the presence of either regular or irregular background noise in the form of spontaneous activations. We find that 1) humans are robustly capable learners in this setting, successfully identifying a variety of ground truth structures but 2) diverging from our computational-level account in ways we can explain with a more tractable simulation and summary statistics approximation scheme. We thus argue that human structure induction from temporal information relies on comparisons between observed patterns and expectations established via mental simulation. Tia Gong 🔗 - Building Object-based Causal Programs for Human-like Generalization (Poster) We present a novel task that measures how people generalize objects' causal powers based on observing a single (Experiment 1) or a few (Experiment 2) causal interactions between object pairs. We propose a computational modeling framework that can synthesize human-like generalization patterns in our task setting, and sheds light on how people may navigate the compositional space of possible causal functions and categories efficiently. Our modeling framework combines a causal function generator that makes use of agent and recipient objects' features and relations, and a Bayesian non-parametric inference process to govern the degree of similarity-based generalization. Our model has a natural “resource-rational” variant that outperforms a naive Bayesian account in describing participants, in particular reproducing a generalization-order effect and causal asymmetry observed in our behavioral experiments. We argue that this modeling framework provides a computationally plausible mechanism for real world causal generalization. Bonan Zhao · Chris Lucas 🔗 - On the Robustness of Causal Algorithmic Recourse (Poster) Algorithmic recourse seeks to provide actionable recommendations for individuals to overcome unfavorable outcomes made by automated decision-making systems. The individual then exerts time and effort to positively change their circumstances. Recourse recommendations should ideally be robust to reasonably small changes in the circumstances (similar individuals, updated classifier in light of larger datasets, and updated causal assumptions about the world). In this work, we formulate the robust recourse problem, derive bounds on the extra cost incurred by individuals seeking robust recourse subject to both linear and nonlinear assumptions, and discuss how to regulate this cost between the individual and the decision-maker. Ricardo Dominguez-Olmedo · Amir Karimi · Bernhard Schölkopf 🔗 - Desiderata for Representation Learning: A Causal Perspective (Poster) Representation learning constructs low-dimensional representations to summarize essential features of high-dimensional data. This learning problem is often approached by describing various desiderata associated with learned representations; e.g., that they be non-spurious, efficient, or disentangled. It can be challenging, however, to turn these intuitive desiderata into formal criteria that can be measured and enhanced based on observed data. In this paper, we take a causal perspective on representation learning, formalizing non-spuriousness and efficiency (in supervised representation learning) and disentanglement (in unsupervised representation learning) using counterfactual quantities and observable consequences of causal assertions. This yields computable metrics that can be used to assess the degree to which representations satisfy the desiderata of interest and learn non-spurious and disentangled representations from single observational datasets. Yixin Wang · Michael Jordan 🔗 - Scalable Variational Approaches for Bayesian Causal Discovery (Poster) A structural equation model (SEM) is an effective framework to reason over causal relationships represented via a directed acyclic graph (DAG). Recent advances have enabled effective maximum-likelihood point estimation of DAGs from observational data. However, a point estimate may not accurately capture the uncertainty in inferring the underlying graph in practical scenarios, wherein the true DAG is non-identifiable and/or the observed dataset is limited. We propose Bayesian Causal Discovery Nets (BCD Nets), a variational inference framework for estimating a distribution over DAGs characterizing a linear-Gaussian SEM. Developing a full Bayesian posterior over DAGs is challenging due to the the discrete and combinatorial nature of graphs. We analyse key design choices for scalable VI over DAGs, such as 1) the parametrization of DAGs via an expressive variational family, 2) a continuous relaxation that enables low-variance stochastic optimization, and 3) suitable priors over the latent variables. We provide a series of experiments on real and synthetic data showing that BCD Nets outperform maximum-likelihood methods on standard causal discovery metrics such as structural Hamming distance in low data regimes. Chris Cundy · Aditya Grover · Stefano Ermon 🔗 - Individual treatment effect estimation in the presence of unobserved confounding based on a fixed relative treatment effect (Poster) In healthcare, treatment effect estimates from randomized controlled trials are often reported on a relative scale, for instance as an odds-ratio for binary outcomes. To weigh potential benefits and harms of treatment this odds-ratio has te be translated to a difference in absolute risk, preferably on an individual patient level. Under the assumption that the relative treatment effect is fixed, it is possible that treatments have widely varying effects on an absolute risk scale. We demonstrate that if this relative treatment effect is known a-priori, for example from randomized trials, it is possible to estimate the treatment effect on an absolute scale on an individualized basis, even in the presence of unobserved confounding. We use this assumption both on a standard logistic regression task and on a task with real-world medical images with simulated outcome data, using convolutional neural networks. On both tasks the method performs well. Wouter van Amsterdam · Rajesh Ranganath 🔗 - A Tree-based Model Averaging Approach for Personalized Treatment Effect Estimation from Heterogeneous Data Sources (Poster) Accurately estimating personalized treatment effects within a single study has been challenging due to the limited sample size. Here we propose a tree-based model averaging approach to improve the estimation efficiency of conditional average treatment effects concerning the population of a target research site by leveraging models derived from potentially heterogeneous populations of other sites, but without them sharing individual-level data. To our best knowledge, there is no established model averaging approach for distributed data with a focus on improving the estimation of treatment effects. Under distributed data networks, we develop an efficient and interpretable tree-based ensemble of personalized treatment effect estimators to join results across hospital sites, while actively modeling for the heterogeneity in data sources through site partitioning. The efficiency of this approach is demonstrated by a study of causal effects of oxygen saturation on hospital mortality and backed up by comprehensive numerical results. Xiaoqing Tan · Lu Tang 🔗 - Multiple Environments Can Reduce Indeterminacy in VAEs (Poster) Parameter and latent variable identifiability in variational autoencoders have received considerable attention recently, due to their empirical success in learning joint probabilities of complex data and their representations. Concurrently, modeling using multiple environments has been suggested for robust causal reasoning. We uncover additional theoretical benefits of multiple environments in the form of a strong identifiability result for a variational autoencoder model with latent covariate shift. We propose a novel learning algorithm that combines empirical Bayes and variational autoencoders, designed for latent variable identifiability without compromising representative power, using multiple environments as a crucial technical and practical tool. Johnny Xi · Benjamin Bloem-Reddy 🔗 - Using Embeddings to Estimate Peer Influence on Social Networks (Poster) We address the problem of using observational data to estimate peer contagion effects, the influence of treatments applied to individuals in a network on the outcomes of their neighbours. A main challenge to such estimation is that homophily - the tendency of connected units to share similar latent traits - acts as an unobserved confounder for contagion effects. Informally, it's hard to tell whether your friends have similar outcomes because they were influenced by your treatment, or whether it's due to some common trait that caused you to be friends in the first place. Because these common causes are not usually directly observed, they cannot be simply adjusted for. We describe an approach to perform the required adjustment using node embeddings learned from the network itself. The main aim is to perform this adjustment non-parametrically, without functional form assumptions on either the process that generated the network or the treatment assignment and outcome processes. The key questions we address are: How should the causal effect be formalized? And, when can embedding methods yield causal identification? Irina Cristali · Victor Veitch 🔗 - Using Non-Linear Causal Models to StudyAerosol-Cloud Interactions in the Southeast Pacific (Poster) Aerosol-cloud interactions include a myriad of effects that all begin when aerosol enters a cloud and acts as cloud condensation nuclei (CCN). An increase in CCN results in a decrease in the mean cloud droplet size (r$_{e}$). The smaller droplet size leads to brighter, more expansive, and longer lasting clouds that reflect more incoming sunlight, thus cooling the earth. Globally, aerosol-cloud interactions cool the Earth, however the strength of the effect is heterogeneous over different meteorological regimes. Understanding how aerosol-cloud interactions evolve as a function of the local environment can help us better understand sources of error in our Earth system models, which currently fail to reproduce the observed relationships. In this work we use recent non-linear, causal machine learning methods to study the heterogeneous effects of aerosols on cloud droplet radius. Andrew Jesson · Peter Manshausen · Alyson Douglas · Duncan Watson-Parris · Yarin Gal · Philip Stier 🔗 - Synthesis of Reactive Programs with Structured Latent State (Poster) The human ability to efficiently discover causal theories of their environments from observations is a feat of nature that remains elusive in machines. In this work, we attempt to make progress on this frontier by formulating the challenge of causal mechanism discovery from observed data as one of program synthesis. We focus on the domain of time-varying, Atari-like 2D grid worlds, and represent causal models in this domain using a programming language called Autumn. Discovering the causal structure underlying a sequence of observations is equivalent to identifying the program in the Autumn language that generates the observations. We introduce a novel program synthesis algorithm, called AutumnSynth, that approaches this synthesis challenge by integrating standard methods of synthesizing functions with an automata synthesis approach, used to discover the model's latent state. We evaluate our method on a suite of Autumn programs designed to express the richness of the domain, which signals of the potential of our formulation. Ria Das · Zenna Tavares · Armando Solar-Lezama · Josh Tenenbaum 🔗 - Causal Inference Using Tractable Circuits (Poster) The aim of this paper is to discuss and draw attention to a recent result which shows that probabilistic inference in the presence of (unknown) causal mechanisms can be tractable for models that have traditionally been viewed as intractable. This result was reported recently in (Darwiche, ECAI 2020) to facilitate model-based supervised learning but it can be interpreted in a causality context as follows. One can compile a non-parametric causal graph into an arithmetic circuit that supports inference in time linear in the circuit size. The circuit is non-parametric so it can be used to estimate parameters from data and to further reason (in linear time) about the causal graph parametrized by these estimates. Moreover, the circuit size can sometimes be independent of the causal graph treewidth, leading to tractable inference on models that have been deemed intractable. This has been enabled by a new technique that can exploit causal mechanisms computationally but without needing to know their identities (the classical setup in causal inference). Our goal is to provide a causality oriented exposure to these new results and to speculate on how they may potentially contribute to more scalable and versatile causal inference. Adnan Darwiche 🔗 - Causal Expectation-Maximisation (Poster) Structural causal models are the fundamental modelling unit in Pearl's causal theory; in principle they allow us to solve counterfactuals, which represent the most expressive level of causal inference. But they most often contain latent variables that limit their application to special settings. In this paper we introduce the causal EM algorithm that aims at reconstructing the uncertainty about the latent variables; based on this, causal inference can approximately be solved via standard algorithms for Bayesian networks. The result is a general method to solve causal inference queries, be they identifiable or not (in which case we deliver bounds), on semi-Markovian structural causal models with categorical variables. We show empirically, as well as by deriving credible intervals, that the approximation we provide becomes accurate in a fair number of EM runs. We show that causal inference is NP-hard also in models characterised by polytree-shaped graphs; this supports developing approximate approaches to causal inference. Finally, we argue that there is possibly an overlooked issue in computing counterfactual bounds without knowledge of the structural equations that might negatively impact on known results. Marco Zaffalon · Alessandro Antonucci · Rafael Cabañas 🔗

#### Author Information

##### Bernhard Schölkopf (MPI for Intelligent Systems, Tübingen)

Bernhard Scholkopf received degrees in mathematics (London) and physics (Tubingen), and a doctorate in computer science from the Technical University Berlin. He has researched at AT&T Bell Labs, at GMD FIRST, Berlin, at the Australian National University, Canberra, and at Microsoft Research Cambridge (UK). In 2001, he was appointed scientific member of the Max Planck Society and director at the MPI for Biological Cybernetics; in 2010 he founded the Max Planck Institute for Intelligent Systems. For further information, see www.kyb.tuebingen.mpg.de/~bs.

##### Yoshua Bengio (Mila / U. Montreal)

Yoshua Bengio is Full Professor in the computer science and operations research department at U. Montreal, scientific director and founder of Mila and of IVADO, Turing Award 2018 recipient, Canada Research Chair in Statistical Learning Algorithms, as well as a Canada AI CIFAR Chair. He pioneered deep learning and has been getting the most citations per day in 2018 among all computer scientists, worldwide. He is an officer of the Order of Canada, member of the Royal Society of Canada, was awarded the Killam Prize, the Marie-Victorin Prize and the Radio-Canada Scientist of the year in 2017, and he is a member of the NeurIPS advisory board and co-founder of the ICLR conference, as well as program director of the CIFAR program on Learning in Machines and Brains. His goal is to contribute to uncover the principles giving rise to intelligence through learning, as well as favour the development of AI for the benefit of all.

##### Judea Pearl (UCLA)

Judea Pearl is a professor of computer science and statistics at UCLA. He is a graduate of the Technion, Israel, and has joined the faculty of UCLA in 1970, where he conducts research in artificial intelligence, causal inference and philosophy of science. Pearl has authored three books: Heuristics (1984), Probabilistic Reasoning (1988), and Causality (2000;2009), the latter won the Lakatos Prize from the London School of Economics. He is a member of the National Academy of Engineering, the American Academy of Arts and Sciences, and a Fellow of the IEEE, AAAI and the Cognitive Science Society. Pearl received the 2008 Benjamin Franklin Medal from the Franklin Institute and the 2011 Rumelhart Prize from the Cognitive Science Society. In 2012, he received the Technion's Harvey Prize and the ACM Alan M. Turing Award.