Timezone: »

Workshop
Causal Machine Learning for Real-World Impact
Nick Pawlowski · Jeroen Berrevoets · Caroline Uhler · Kun Zhang · Mihaela van der Schaar · Cheng Zhang

Fri Dec 02 06:30 AM -- 03:00 PM (PST) @ Room 295 - 296

Causality has a long history, providing it with many principled approaches to identify a causal effect (or even distill cause from effect). However, these approaches are often restricted to very specific situations, requiring very specific assumptions. This contrasts heavily with recent advances in machine learning. Real-world problems aren’t granted the luxury of making strict assumptions, yet still require causal thinking to solve. Armed with the rigor of causality, and the can-do-attitude of machine learning, we believe the time is ripe to start working towards solving real-world problems.

 Fri 6:30 a.m. - 6:45 a.m. Opening Remarks Cheng Zhang · Mihaela van der Schaar 🔗 Fri 6:45 a.m. - 7:15 a.m. Learning Causal Structures and Causal Representations from Data ( Talk ) Peter Spirtes 🔗 Fri 7:15 a.m. - 8:00 a.m. Panel Discussion Cheng Zhang · Mihaela van der Schaar · Ilya Shpitser · Aapo Hyvarinen · Yoshua Bengio · Bernhard Schölkopf 🔗 Fri 8:00 a.m. - 8:45 a.m. Poster Session 🔗 Fri 8:00 a.m. - 8:30 a.m. Coffee Break 🔗 Fri 8:45 a.m. - 9:05 a.m. Causal Discovery for Real World Applications: A Case Study ( Talk ) Stefan Bauer 🔗 Fri 9:05 a.m. - 9:25 a.m. Learning Neural Causal Models ( Talk ) Nan Rosemary Ke 🔗 Fri 9:30 a.m. - 9:45 a.m. Discrete Learning Of DAGs Via Backpropagation ( Talk ) Andrew Wren · Pasquale Minervini · Luca Franceschi · Valentina Zantedeschi 🔗 Fri 9:45 a.m. - 10:00 a.m. Local Causal Discovery for Estimating Causal Effects ( Talk ) Shantanu Gupta · David Childers · Zachary Lipton 🔗 Fri 10:00 a.m. - 10:15 a.m. Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design ( Talk ) Mayleen Cortez · Matthew Eichhorn · Christina Yu 🔗 Fri 10:15 a.m. - 10:30 a.m. Hydranet: A Neural Network for the estimation of Multi-valued Treatment Effects ( Talk ) Borja Velasco · Jesus Cerquides · Josep Arcos 🔗 Fri 10:30 a.m. - 11:45 a.m. Lunch Break 🔗 Fri 10:30 a.m. - 11:45 a.m. Poster Session 🔗 Fri 11:45 a.m. - 12:15 p.m. Causal ML for medicines R&D ( Talk ) Jim Weatherall 🔗 Fri 12:15 p.m. - 12:45 p.m. Planning and Learning from Interventions in the Context of Cancer Immunotherapy ( Talk ) Caroline Uhler 🔗 Fri 12:45 p.m. - 1:30 p.m. Coffee Break 🔗 Fri 12:45 p.m. - 1:30 p.m. Poster Session 🔗 Fri 1:30 p.m. - 2:00 p.m. Stable Discovery of Interpretable Subgroups via Calibration in Causal Studies ( Talk ) Bin Yu 🔗 Fri 2:00 p.m. - 2:15 p.m. A Design-Based Riesz Representation Framework For Randomized Experiments ( Talk ) Christopher Harshaw · Yitan Wang · Fredrik Sävje 🔗 Fri 2:15 p.m. - 2:30 p.m. A Causal AI Suite for Decision-Making ( Talk ) Emre Kiciman 🔗 Fri 2:30 p.m. - 2:45 p.m. Causal Analysis of the TOPCAT Trial: Spironolactone for Preserved Cardiac Function Heart Failure ( Talk ) Francesca Raimondi · Tadhg O'Keeffe · Andrew Lawrence · Tamara Stemberga · Andre Franca · Maksim Sipos · Javed Butler · Shlomo Ben-Haim 🔗 Fri 2:45 p.m. - 3:00 p.m. Closing Remarks Cheng Zhang · Mihaela van der Schaar 🔗 - Evaluating the Impact of Geometric and Statistical Skews on Out-Of-Distribution Generalization Performance ( Poster )  link » Out-of-distribution (OOD) or domain generalization is the problem of generalizing to unseen distributions. Recent work suggests that the marginal difficulty of generalizing to OOD over in-distribution data (OOD-ID generalization gap) is due to spurious correlations, which arise due to statistical and geometric skews, and can be addressed by careful data augmentation and class balancing. We observe that after constructing a dataset where we remove all conceivable sources of spurious correlation between interpretable factors, classifiers still fail to close the OOD-ID generalization gap. Link » Aengus Lynch · Jean Kaddour · Ricardo Silva 🔗 - Targeted Causal Elicitation ( Poster )  link » We look at the problem of learning causal structure for a fixed downstream causal effect optimization task. In contrast to previous work which often focuses on running interventional experiments, we consider an often overlooked source of information - a domain expert. In the Bayesian setting this amounts to augmenting the likelihood with a user model whose parameters account for possible biases of the expert. Such a model allows for active elicitation in a manner that is most informative to the optimization task at hand. Link » Nazaal Ibrahim · ST John · Zhigao Guo · Samuel Kaski 🔗 - Using Interventions to Improve Out-of-Distribution Generalization of Text-Matching Systems ( Poster )  link » Given a user's input text, text-matching recommender systems output relevant items by comparing the input text to available items' description, such as product-to-product recommendation on e-commerce platforms. As users' interests and item inventory are expected to change, it is important for a text-matching system to generalize to data shifts, a task known as out-of-distribution (OOD) generalization. However, we find that the popular approach of fine-tuning a large, base language model on paired item relevance data (e.g., user clicks) can be counter-productive for OOD generalization. For a product recommendation task, fine-tuning obtains worse accuracy than the base model when recommending items in a new category or for a future time period. To explain this generalization failure, we consider an intervention-based importance metric, which shows that a fine-tuned model captures spurious correlations and fails to learn the causal features that determine the relevance between any two text inputs. Moreover, standard methods for causal regularization do not apply in this setting, because unlike in images, there exist no universally spurious features in a text-matching task (the same token may be spurious or causal depending on the text it is being matched to). For OOD generalization on text inputs, therefore, we highlight a different goal: avoiding high importance scores for certain features. We do so using an intervention-based regularizer that constraints the causal effect of any token on the model's relevance score to be similar to the base model. Results on Amazon product and 3 question recommendation datasets show that our proposed regularizer improves generalization for both in-distribution and OOD evaluation, especially in difficult scenarios when the base model is not accurate. Link » Parikshit Bansal · Yashoteja Prabhu · Emre Kiciman · Amit Sharma 🔗 - Exploiting Selection Bias on Underspecified Tasks in Large Language Models ( Poster )  link » In this paper we motivate the causal mechanisms behind sample selection induced collider bias (selection collider bias) that can cause Large Language Models (LLMs) to learn unconditional dependence between entities that are unconditionally independent in the real world. We show that selection collider bias can become amplified in underspecified learning tasks, and although difficult to overcome, we describe a method to exploit the resulting spurious correlations for determination of when a model may be uncertain about its prediction. We demonstrate an uncertainty metric that matches human uncertainty in tasks with gender pronoun underspecification on an extended version of the Winogender Schemas evaluation set, and we provide online demos where users can evaluate spurious correlations and apply our uncertainty metric to their own texts and models. Finally, we generalize our approach to address a wider range of prediction tasks. Link » Emily McMilin 🔗 - Making the World More Equal, One Ride at a Time: Studying Public Transportation Initiatives Using Interpretable Causal Inference ( Poster )  link » The goal of low-income fare subsidy programs is to increase equitable access to public transit, and in doing so, increase access to jobs, housing, education and other essential resources. King County Metro, one of the largest transit providers focused on equitable public transit, has been innovative in launching new programs for low-income riders. However, due to the observational nature of data on ridership behavior in King County, evaluating the effectiveness of such innovative policies is difficult. In this work, we used seven datasets from a variety of sources, and used a recent interpretable machine-learning-based causal inference matching method called FLAME to evaluate one of King County Metro’s largest programs implemented in 2020: the Subsidized Annual Pass (SAP). Using FLAME, we construct high-quality matched groups and identify features that are important for predicting ridership and re-enrollment. Our analysis provides clear and insightful feedback for policy-makers. In particular, we found that SAP is effective in increasing long-term ridership and re-enrollment. Notably, there are pronounced positive treatment effects in populations that have higher access to public transit and jobs. Treatment effects are also more pronounced in the Asian population and in individuals ages 65+. Insights from this work can help broadly inform public transportation policy decisions and generalize broadly to other cities and other forms of transportation. Link » Gaurav Rajesh Parikh · Albert Sun · Jenny Huang · Lesia Semenova · Cynthia Rudin 🔗 - Non-Stationary Causal Bandits ( Poster )  link » The causal bandit problem is an extension of the conventional multi-armed bandit problem in which the arms available are not independent of each other, but rather are correlated within themselves in a Bayesian graph. This extension is more natural, since day-to-day cases of bandits often have a causal relation between their actions and hence are better represented as a causal bandit problem. Moreover, the class of conventional multi-armed bandits lies within that of causal bandits, since any instance of the former can be modeled in the latter setting by using a Bayesian graph with all independent variables. However, it is generally assumed that the probabilistic distributions in the Bayesian graph are stationary.In this paper, we design non-stationary causal bandit algorithms by equipping the actual state of the art (mainly \algo{causal UCB}, \algo{causal Thompson Sampling}, \algo{causal KL UCB} and \algo{Online Causal TS}) with the restarted Bayesian online change-point detector \cite{RBOCPD}. Experimental results show the minimization of the regret when using optimal change-point detection. Link » REDA ALAMI 🔗 - Can Active Sampling Reduce Causal Confusion in Offline Reinforcement Learning? ( Poster )  link » Causal confusion is a phenomenon where an agent learns a policy that reflects imperfect spurious correlations in the data. The resulting causally confused behaviors may appear desirable during training but may fail at deployment. This problem gets exacerbated in domains such as robotics with potentially large gaps between open- and closed-loop performance of an agent. In such cases, a causally confused model may appear to perform well according to open-loop metrics but fail catastrophically when deployed in the real world. In this paper, we conduct the first study of causal confusion in offline reinforcement learning and hypothesize that selectively sampling data points that may help disambiguate the underlying causal mechanism of the environment may alleviate causal confusion. To investigate this hypothesis, we consider a set of simulated setups to study causal confusion and the ability of active sampling schemes to reduce its effects. We provide empirical evidence that random and active sampling schemes are able to consistently reduce causal confusion as training progresses and that active sampling is able to do so more efficiently than random sampling. Link » Gunshi Gupta · Tim G. J. Rudner · Rowan McAllister · Adrien Gaidon · Yarin Gal 🔗 - A Causal AI Suite for Decision-Making ( Poster )  link » Critical data science and decision-making questions across a wide variety of domains are fundamentally causal questions. The causal AI research area is still early in its development, however, and as with any technology area, will require many more advances and iterative practical deployments to reach its full impact. We present a suite of open-source causal tools and libraries that aims to simultaneously provide core causal AI functionality to practitioners and create a platform for research advances to be rapidly deployed. In this paper, we describe our contributions towards such a comprehensive causal AI suite of tools and libraries, its design, and lessons we are learning from its growing adoption. We hope that our work accelerates use-inspired basic research for improvement of causal AI. Link » Emre Kiciman · Eleanor Dillon · Darren Edge · Adam Foster · Joel Jennings · Chao Ma · Robert Ness · Nick Pawlowski · Amit Sharma · Cheng Zhang 🔗 - Unit Selection: Learning Benefit Function from Finite Population Data ( Poster )  link » The unit selection problem is to identify a group of individuals who are most likely to exhibit a desired mode of behavior, for example, selecting individuals who would respond one way if incentivized and a different way if not. The unit selection problem consists of evaluation and search subproblems. Li and Pearl defined the "benefit function" to evaluate the average payoff of selecting a certain individual with given characteristics. The search subproblem is then to design an algorithm to identify the characteristics that maximize the above benefit function. The hardness of the search subproblem arises due to the large number of characteristics available for each individual and the sparsity of the data available in each cell of characteristics. In this paper, we present a machine learning framework that uses the bounds of the benefit function that are estimable from the finite population data to learn the bounds of the benefit function for each cell of characteristics. Therefore, we could easily obtain the characteristics that maximize the benefit function. Link » Ang Li · Song Jiang · Yizhou Sun · Judea Pearl 🔗 - Neural Bayesian Network Understudy ( Poster )  link » Bayesian Networks may be appealing for clinical decision-making due to their inclusion of causal knowledge, but their practical adoption remains limited as a result of their inability to deal with unstructured data. While neural networks do not have this limitation, they are not interpretable and are inherently unable to deal with causal structure in the input space. Our goal is to build neural networks that combine the advantages of both approaches. Motivated by the perspective to inject causal knowledge while training such neural networks, this work presents initial steps in that direction. We demonstrate how a neural network can be trained to output conditional probabilities, providing approximately the same functionality as a Bayesian Network. Additionally, we propose two training strategies that allow encoding the independence relations inferred from a given causal structure into the neural network. We present initial results in a proof-of-concept setting, showing that the neural model acts as an understudy to its Bayesian Network counterpart, approximating its probabilistic and causal properties. Link » Paloma Rabaey · Cedric De Boom · Thomas Demeester 🔗 - Hydranet: A Neural Network for the estimation of Multi-valued Treatment Effects ( Poster )  link » The clinical effectiveness aspect within the Health Technology Assessment process often faces causal questions where the treatment variable can take multiple values. Nevertheless, most developments in causal inference algorithms that employ machine learning happen in binary treatment settings. In addition, there is a big gap between the algorithmic state of the art and the applied state of the art in this field. In this paper, we select a state-of-the-art, neural network-based algorithm for binary treatment effect estimation and generalize it to a multi-valued treatment setting, testing it with semi-synthetic data that could mimic an HTA process. We obtain an estimator with desirable asymptotic properties and good results in experiments. To the best of our knowledge, this work is opening ground for the benchmarking of neural network-based algorithms for multi-valued treatment effect estimation. Link » Borja Velasco · Jesus Cerquides · Josep Arcos 🔗 - Deep End-to-end Causal Inference ( Poster )  link » Causal inference is essential for data-driven decision making across domains such as business engagement, medical treatment and policy making. However, research on causal discovery has evolved separately from causal inference, preventing straight-forward combination of methods from both fields. In this work, we develop Deep End-to-end Causal Inference (DECI), a non-linear additive noise model with neural network functional relationships that takes in observational data and can perform both causal discovery and inference, including conditional average treatment effect (CATE) estimation. We provide a theoretical guarantee that DECI can asymptotically recover the ground truth causal graph and treatment effects when correctly specified. Our results show the competitive performance of DECI when compared to relevant baselines for both causal discovery and (C)ATE estimation in over a thousand experiments on both synthetic datasets and causal machine learning benchmarks. Link » Tomas Geffner · Javier Antorán · Adam Foster · Wenbo Gong · Chao Ma · Emre Kiciman · Amit Sharma · Angus Lamb · Martin Kukla · Nick Pawlowski · Miltiadis Allamanis · Cheng Zhang 🔗 - Contrastive Unsupervised Learning of World Model with Invariant Causal Features ( Poster )  link » In this paper we present a world model, which learns causal features using the invariance principle. In particular, we use contrastive unsupervised learning to learn the invariant causal features, which enforces invariance across augmentations of irrelevant parts or styles of the observation. The world-model-based reinforcement learning methods independently optimize representation learning and the policy. Thus naive contrastive loss implementation collapses due to a lack of supervisory signals to the representation learning module. We propose an intervention invariant auxiliary task to mitigate this issue. Specifically, we utilize depth prediction to explicitly enforce the invariance and use data augmentation as style intervention on the RGB observation space. Our design leverages unsupervised representation learning to learn the world model with invariant causal features. Our proposed method significantly outperforms current state-of-the-art model-based and model-free reinforcement learning methods on out-of-distribution point navigation tasks on the iGibson dataset. Moreover, our proposed model excels at the sim-to-real transfer of our perception learning module. Link » Rudra PK Poudel · Harit Pandya · Roberto Cipolla 🔗 - Toward Fair and Robust Optimal Treatment Regimes ( Poster )  link » We propose a new framework for robust nonparametric estimation of optimal treatment regimes under flexible fairness constraints.Under standard regularity conditions we show that the resulting estimators possess the double robustness property. We use this framework to characterize the trade-off between fairness and the maximum welfare that is achievable by the optimal treatment policy. Link » Kwangho Kim · Jose Zubizarreta 🔗 - Counterfactual Generation Under Confounding ( Poster )  link » A machine learning model, under the influence of observed or unobserved confounders in the training data, can learn spurious correlations and fail to generalize when deployed. For image classifiers, augmenting a training dataset using counterfactual examples has been empirically shown to break spurious correlations. However, the counterfactual generation task itself becomes more difficult as the level of confounding increases. Existing methods for counterfactual generation under confounding consider a fixed set of interventions (e.g., texture, rotation) and are not flexible enough to capture diverse data-generating processes. We formally characterize the adverse effects of confounding on any downstream tasks and show that the correlation between generative factors can be used to quantitatively measure confounding. To minimize such correlation, we propose a counterfactual generation method that learns to modify the value of any attribute in an image and generate new images. Our method is computationally efficient, simple to implement, and works well for any number of generative factors and confounding variables. Our experimental results on both synthetic (MNIST variants) and real-world (CelebA) datasets show the usefulness of our approach. Link » Abbavaram Gowtham Reddy · Saloni Dash · Amit Sharma · Vineeth N Balasubramanian 🔗 - A Causal Inference Framework for Network Interference with Panel Data ( Poster )  link » We propose a framework for causal inference with panel data in the presence of network interference and unobserved confounding. Key to our approach is a novel latent factor model that takes into account network interference and generalizes the factor models typically used in panel data settings. We propose an estimator–the Network Synthetic Interventions estimator—and show that it consistently estimates the counterfactual outcomes for a unit under an arbitrary set of treatments, if certain observation patterns hold in the data. We corroborate our theoretical findings with simulations. In doing so, our framework extends the Synthetic Control and Synthetic Interventions methods to incorporate network interference. Link » Sarah Cen · Anish Agarwal · Christina Yu · Devavrat Shah 🔗 - Improving the Efficiency of the PC Algorithm by Using Model-Based Conditional Independence Tests ( Poster )  link » Learning causal structure is useful in many areas of artificial intelligence, such as planning, robotics, and explanation. Constraint-based and hybrid structure learning algorithms such as PC use conditional independence (CI) tests to learn a causal structure. Traditionally, constraint-based algorithms perform the CI tests with a preference for smaller-sized conditioning sets, partially because the statistical power of conventional CI tests declines substantially as the size of the conditioning set increases. However, many modern conditional independence tests are \textit{model-based}, and these tests use well-regularized models that can perform well even with very large conditioning sets. This suggests an intriguing new strategy for constraint-based algorithms which may result in a reduction of the total number of CI tests performed: Test variable pairs with \textit{large} conditioning sets \textit{first}, as a pre-processing step that finds some conditional independencies quickly, before moving on to the more conventional strategy of testing with incrementally larger conditioning sets of sizes (beginning with marginal independence tests). We propose such a pre-processing step for the PC algorithm which relies on performing CI tests on a few randomly selected large conditioning sets. We perform an empirical analysis on directed acyclic graphs (DAGs) that correspond to real-world systems and both an empirical and theoretical analysis for Erd\H{o}s-Renyi DAGs. Our results show that the PC algorithm with our pre-processing step performs far fewer CI tests than the original PC algorithm, between 0.5\% and 20\%, of the CI tests that the PC algorithm alone performs. The efficiency gains are particularly significant for the DAGs corresponding to real-world systems. Link » Erica Cai · Andrew McGregor · David Jensen 🔗 - Identifying Causal Effects Of Exercise On Irregular Heart Rhythm Events Using Wearable Device Data ( Poster )  link » Wearable devices can passively monitor user health by tracking a set of metrics, including activity and heart rate. The Apple Watch introduced Irregular Rhythm Notifications (IRNs), which alert a user when the watch detects an arrhythmia over a sustained period that is highly suggestive of atrial fibrillation (AFib). Arrhythmias like AFib are often episodic, and episodes are suspected to have triggers like sleep changes, alcohol intake, or exercise. We study the proximal connection between Apple Exercise Minutes, a measure of moderate to strenuous exercise, and IRN events, using a causal observational study with data from the Apple Heart and Movement Study. We find that while increased exercise levels have a broadly protective effect, a large daily increase in exercise relative to a user's baseline increases the risk of an IRN on that day. Link » Lauren Hannah · Adam Bouyamourn 🔗 - On Causal Rationalization ( Poster )  link » With recent advances in natural language processing, rationalization becomes an essential self-explaining diagram to disentangle the black box by selecting a subset of input texts to account for the major variation in prediction. Yet, existing association-based approaches on rationalization cannot identify true rationales when two or more rationales are highly intercorrelated, and thus provide a similar contribution to prediction accuracy, so-called spuriousness. To address this limitation, we novelly leverage two causal desiderata, non-spuriousness and efficiency, into rationalization from the causal inference perspective. We formally define the probability of causation in the rationale model with its identification established as the main component of learning necessary and sufficient rationales. The superior performance of our causal rationalization is demonstrated on real-world review and medical datasets with extensive experiments compared to state-of the-art methods. Link » Wenbo Zhang · TONG WU · Yunlong Wang · Yong Cai · Hengrui Cai 🔗 - The Counterfactual-Shapley Value: Attributing Change in System Metrics ( Poster )  link » Given an unexpected change in the output metric of a large-scale system, it is important to answer why the change occurred: which inputs caused the change in metric? A key component of such an attribution question is estimating the counterfactual: the (hypothetical) change in the system metric due to a specified change in a single input. However, due to inherent stochasticity and complex interactions between parts of the system, it is difficult to model an output metric directly. We utilize the computational structure of a system to break up the modelling task into sub-parts, such that each sub-part corresponds to a more stable mechanism that can be modelled accurately over time. Using the system's structure also helps to view the metric as a computation over a structural causal model (SCM), thus providing a principled way to estimate counterfactuals. Specifically, we propose a method to estimate counterfactuals using time-series predictive models and construct an attribution score, CF-Shapley, that is consistent with desirable axioms for attributing an observed change in the output metric. Unlike past work on causal shapley values, our proposed method can attribute a single observed change in output (rather than a population-level effect) and thus provides more accurate attribution scores when evaluated on simulated datasets. As a real-world application, we analyze a query-ad matching system with the goal of attributing observed change in a metric for ad matching density. Attribution scores explain how query volume and ad demand from different query categories affect the ad matching density, uncovering the role of external events (e.g., "Cheetah Day") in driving the matching density. Link » Amit Sharma · Hua Li · Jian Jiao 🔗 - Beyond Central Limit Theorem for Higher Order Inference in Batched Bandits ( Poster )  link » Adaptive experiments have been gaining traction in a variety of domains, which stimulates a growing literature focusing on post-experimental statistical inference on data collected from such designs. Prior work constructs confidence intervals mainly based on two types of methods: (i) martingale concentration inequalities and (ii) asymptotic approximation to distribution of test statistics; this work contributes to the second kind. The current asymptotic approximation methods however mostly rely on first-order limit theorems, which can have a slow convergence rate in a data-poor regime. Besides, established results often rely on conditions that noises are well-behaved, which can be problematic when the real-world instances are heavy-tailed or asymmetric. In this paper, we propose the first higher-order asymptotic expansion formula for inference on adaptively collected data, which generalizes normal approximation to the distribution of standard test statistics. Our theorem relaxes assumptions on the noise distribution and benefits a fast convergence rate to accommodate small sample sizes. We complement our results by promising empirical performances in simulations. Link » Yechan Park · Ruohan Zhan · Nakahiro Yoshida 🔗 - Valid Inference after Causal Discovery ( Poster )  link » Causal graph discovery and causal effect estimation are two fundamental tasks in causal inference. While many methods have been developed for each task individually, statistical challenges arise when applying these methods jointly: estimating causal effects after running causal discovery algorithms on the same data leads to "double dipping," invalidating coverage guarantees of classical confidence intervals. To this end, we develop tools for valid post-causal-discovery inference. One key contribution is a randomized version of the greedy equivalence search (GES) algorithm, which permits a valid, distribution-free correction of classical confidence intervals. We show that a naive combination of causal discovery and subsequent inference algorithms typically leads to highly inflated miscoverage rates; at the same time, our noisy GES method provides reliable coverage control while achieving more accurate causal graph recovery than data splitting. Link » Paula Gradu · Tijana Zrnic · Yixin Wang · Michael Jordan 🔗 - Can Large Language Models Build Causal Graphs? ( Poster )  link » Building causal graphs can be a laborious process. To ensure all relevant variables have been captured, researchers often have to discuss with clinicians and experts while also reviewing extensive relevant medical literature. By encoding common and medical knowledge, large language models (LLMs) represent an opportunity to ease this process by automatically scoring edges (i.e., connections between two variables) in potential graphs. LLMs however have been shown to be brittle to the choice of probing words, context, and prompt that the user employs. In this work, we evaluate if LLMs can be a useful tool in speeding up causal graph development. Link » Stephanie Long · Tibor Schuster · Alexandre Piche 🔗 - Counterfactual Decision Support Under Treatment-Conditional Outcome Measurement Error ( Poster )  link » Growing work in algorithmic decision support proposes methods for combining predictive models with human judgment to improve decision quality. A challenge that arises in this setting is predicting the risk of a decision-relevant target outcome under multiple candidate actions. While counterfactual prediction techniques have been developed for these tasks, current approaches do not account for measurement error in observed labels. This is a key limitation because in many domains, observed labels (e.g., medical diagnoses, defendant re-arrest) serve as a proxy for the target outcome of interest (e.g., biological medical outcomes, recidivism). We develop a method for counterfactual prediction of target outcomes observed under treatment-conditional outcome measurement error (TC-OME). Our method minimizes risk with respect to target potential outcomes given access to observational data and estimates of measurement error parameters. We also develop a method for estimating error parameters in cases where these are unknown in advance. Through a synthetic evaluation, we show that our approach achieves performance parity with an oracle model when measurement error parameters are known and retains performance given moderate bias in error parameter estimates. Link » Luke Guerdan · Amanda Coston · Kenneth Holstein · Steven Wu 🔗 - Causal Estimation for Text Data with (Apparent) Overlap Violations ( Poster )  link » Consider the problem of estimating the causal effect of some attribute of a text document; for example: what effect does writing a polite vs. rude email have on response time? To estimate a causal effect from observational data, we need to adjust for confounding aspects of the text that affect both the treatment and outcome---e.g., the topic or writing level of the text. These confounding aspects are unknown a priori, so it seems natural to adjust for the entirety of the text (e.g., using a transformer). However, causal identification and estimation procedures rely on the assumption of overlap: for all levels of the adjustment variables, there is randomness leftover so that every unit could have (not) received treatment. Since the treatment here is itself an attribute of the text, it is perfectly determined, and overlap is apparently violated. The purpose of this paper is to show how to handle causal identification and obtain robust causal estimation in the presence of apparent overlap violations. In brief, the idea is to use supervised representation learning to produce a data representation that preserves confounding information while eliminating information that is only predictive of the treatment. This representation then suffices for adjustment and satisfies overlap. Adapting results on non-parametric estimation, we find that this procedure is robust to conditional outcome misestimation, yielding a low-bias estimator with valid uncertainty quantification under weak conditions. Empirical results show strong improvements in bias and uncertainty quantification relative to the natural baseline. Link » Lin Gui · Victor Veitch 🔗 - Initial Results for Pairwise Causal Discovery Using Quantitative Information Flow ( Poster )  link » Pairwise Causal Discovery is the task of determining causal, anti-causal, confounded or independence relationships from real-world datasets (i.e., pairs of variables). Over the last few years, this challenging task has subsidized not only the discovery of novel machine learning models aimed at solving the task, but also discussions on how learning the causal direction of variables may benefit machine learning overall. In this paper, we show that Quantitative Information Flow (QIF), a measure usually employed for measuring leakages of information from a system to an attacker, shows promising results as features for the causal discovery task. In particular, experiments with real-world datasets indicate that QIF is statistically tied to the state of the art. Our initial results motivate further inquiries on how QIF relates to causality and what are its limitations. Link » Felipe Giori · Flavio Figueiredo 🔗 - Do-Operation Guided Causal Representation Learning with Reduced Supervision Strength ( Poster )  link » Causal representation learning has been proposed to encode causal relationships between factors presented in the high dimensional data. Existing methods are limited to being trained and fully supervised by ground-truth generative factors. In this paper, we seek to reduce supervision strength by leveraging intervention on either the cause factor or effect factor for reducing supervision strength. Applying interventions on cause factors and effect factors will lead to different results since intervention on effect factors will change the causal graph. In contrast, intervention on cause factors will not change the relationships. The intervention can also be called \emph{do-operation}. Based on this attribute of \emph{do-operation}, we propose a framework called Do-VAE, which implements \emph{do-operation} by swapping latent cause factors and effect factors encoded from a pair of inputs and utilizing the supervision signal from a pair of inputs by comparing original inputs and reconstructions. Moreover, we also identify the inadequacy of existing causal representation metrics and introduce new metrics for better evaluation. Link » Jiageng Zhu · Hanchen Xie · Wael Abd-Almageed 🔗 - Mitigating input-causing confounding in multimodal learning via the backdoor adjustment ( Poster )  link » We adopt a causal perspective to address why multimodal learning often performs worse than unimodal learning. We put forth a structural causal model (SCM) for which multimodal learning is preferable over unimodal learning. In this SCM, which we call the multimodal SCM, a latent variable causes the inputs, and the inputs cause the target. We refer to this latent variable as the input-causing confounder. By conditioning on all inputs, multimodal learning $d$-separates the input- causing confounder and the target, resulting in a causal model that is more robust than the statistical model learned by unimodal learning. We argue that multimodal learning fails in practice because our finite datasets appear to come from an alternative SCM, which we call the spurious SCM. In the spurious SCM, the input-causing confounder and target are conditionally dependent given the inputs. This means that multimodal learning no longer $d$-separates the input-causing confounder and the target, and fails to estimate a causal model. We use a latent variable model to model the input-causing confounder, and test whether the undesirable dependence with the target is present in the data. We then use the same model to remove this dependence and estimate a causal model, which corresponds to the backdoor adjustment. We use synthetic data experiments to validate our claims. Link » Taro Makino · Krzysztof Geras · Kyunghyun Cho 🔗 - Generalized Synthetic Control Method with State-Space Model ( Poster )  link » Synthetic control method (SCM) is a widely used approach to assess the treatment effect of a point-wise intervention for cross-sectional time-series data. The goal of SCM is to approximate the counterfactual outcomes of the treated unit as a combination of the control units' observed outcomes. Many studies propose a linear factor model as a parametric justification for the SCM that assumes the synthetic control weights are invariant across time. However, such an assumption does not always hold in practice. We propose a generalized SCM with time-varying weights based on state-space model (GSC-SSM), allowing for a more flexible and accurate construction of counterfactual series. GSC-SSM recovers the classic SCM when the hidden weights are specified as constant. It applies Bayesian shrinkage for a two-way sparsity of the estimated weights across both the donor pool and the time. On the basis of our method, we shed light on the role of auxiliary covariates, on nonlinear and non-Guassian state-space model, and on the prediction interval based on time-series forecasting. We apply GSC-SSM to investigate the impact of German reunification and a mandatory certificate on COVID-19 vaccine compliance. Link » Junzhe Shao · Mingzhang Yin · Xiaoxuan Cai · Linda Valeri 🔗 - On counterfactual inference with unobserved confounding ( Poster )  link » Given an observational study with $n$ independent but heterogeneous units and one $p$-dimensional sample per unit containing covariates, interventions, and outcomes, our goal is to learn counterfactual distribution for each unit. We consider studies with unobserved confounding which introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family and under suitable conditions, we reduce learning the $n$ unit-level counterfactual distributions to learning $n$ exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all $n$ samples to jointly learn all $n$ parameters and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are $s$-sparse linear combination of $k$ known vectors, the error is $O(s\log k/p)$. En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. Link » Abhin Shah · Raaz Dwivedi · Devavrat Shah · Gregory Wornell 🔗 - Identifying causes of Pyrocumulonimbus (PyroCb) ( Poster )  link » A first causal discovery analysis from observational data of pyroCb (storm clouds generated from extreme wildfires) is presented. Invariant Causal Prediction was used to develop tools to understand the causal drivers of pyroCb formation. This includes a conditional independence test for testing $Y \indep E|X$ for binary variable $Y$ and multivariate, continuous variables $X$ and $E$ and a greedy-ICP search algorithm that relies on fewer conditional independence tests to obtain a smaller more manageable set of causal predictors. With these tools we identified a subset of seven causal predictors which are plausible when contrasted with domain knowledge: surface sensible heat flux, relative humidity at 850hPa, a component of wind at 250 hPa, 13.3 \textmu m thermal emissions, convective available potential energy and altitude. Link » Emiliano Diaz · Kenza Tazi · Ashwin Braude · Daniel Okoh · Kara Lamb · Duncan Watson-Parris · Paula Harder · Nis Meinert 🔗 - Rhino: Deep Causal Temporal Relationship Learning with history-dependent noise ( Poster )  link » Discovering causal relationships between different variables from time series data has been a long-standing challenge for many domains. For example, in stock markets, the announcement of acquisitions from leading companies may have immediate effects on stock prices and increased uncertainty of the future market due to this past action. This requires the model to take non-linear relationships, instantaneous effects and the past-action dependent uncertainty into account. We name the latter as history-dependent noise. However, previous works do not offer a solution addressing all these problems together. In this paper, we propose a structural equation model, called Rhino, which combines vector auto-regression, deep learning and variational inference to model non-linear relationships with instantaneous effects and flexible history-dependent noise. Theoretically, we prove the structural identifiability for a generalization of Rhino. Our empirical results from extensive synthetic experiments and a real-world benchmark demonstrate better discovery performance compared to relevant baselines, with ablation studies revealing its robustness when the Rhino is misspecified. Link » Wenbo Gong · Joel Jennings · Cheng Zhang · Nick Pawlowski 🔗 - Causal Analysis of the TOPCAT Trial: Spironolactone for Preserved Cardiac Function Heart Failure ( Poster )  link » We describe the results of applying causal discovery methods on the data from a multi-site clinical trial, on the Treatment of Preserved Cardiac Function Heart Failure with an Aldosterone Antagonist (TOPCAT). The trial was inconclusive, with no clear benefits consistently shown for the whole cohort. However, there were questions regarding the reliability of the diagnosis and treatment protocol for a geographic subgroup of the cohort. With the inclusion of medical context in the form of domain knowledge, causal discovery is used to demonstrate regional discrepancies and to frame the regional transportability of the results. Furthermore, we show that, globally and especially for some subgroups, the treatment has significant causal effects, thus offering a more refined view of the trial results. Link » Francesca Raimondi · Tadhg O'Keeffe · Hana Chockler · Andrew Lawrence · Tamara Stemberga · Andre Franca · Maksim Sipos · Javed Butler · Shlomo Ben-Haim 🔗 - Conditional differential measurement error: partial identifiability and estimation ( Poster )  link » Differential measurement error, which occurs when the level of error in the measured outcome is correlated with the treatment renders the causal effect unidentifiable from observational data. We study conditional differential measurement error, where a subgroup of the population is known to be prone to differential measurement error. Under an assumption about the direction (but not magnitude) of the measurement error, we derive sharp bounds on the conditional average treatment effect and present an approach to estimate them. We empirically validate our approach on semi-synthetic and real data, showing that it gives a more credible and informative bound than other approaches. Link » Pengrun Huang · Maggie Makar 🔗 - Active Bayesian Causal inference ( Poster )  link » Causal discovery and causal reasoning are classically treated as separate and consecutive tasks: one first infers the causal graph, and then uses it to estimate causal effects of interventions. However, such a two-stage approach is uneconomical, especially in terms of actively collected interventional data, since the causal query of interest may not require a fully-specified causal model. From a Bayesian perspective, it is also unnatural, since a causal query (e.g., the causal graph or some causal effect) can be viewed as a latent quantity subject to posterior inference—quantities that are not of direct interest ought to be marginalized out in this process, thus contributing to our overall uncertainty. In this work, we propose Active Bayesian Causal Inference (ABCI), a fully-Bayesian active learning framework for integrated causal discovery and reasoning, i.e., for jointly inferring a posterior over causal models and queries of interest. In our approach to ABCI, we focus on the class of causally-sufficient nonlinear additive Gaussian noise models, which we model using Gaussian processes. To capture the space of causal graphs, we use a continuous latent graph representation, allowing our approach to scale to practically relevant problem sizes. We sequentially design experiments that are maximally informative about our target causal query, collect the corresponding interventional data, update our beliefs, and repeat. Through simulations, we demonstrate that our approach is more data-efficient than existing methods that only focus on learning the full causal graph. This allows us to accurately learn downstream causal queries from fewer samples, while providing well-calibrated uncertainty estimates of the quantities of interest. Link » Christian Toth · Lars Lorch · Christian Knoll · Andreas Krause · Franz Pernkopf · Robert Peharz · Julius von Kügelgen 🔗 - Bounding the Effects of Continuous Treatments for Hidden Confounders ( Poster )  link » Observational studies often seek to infer the causal effect of a treatment even though both the assigned treatment and the outcome depend on other confounding variables. An effective strategy for dealing with confounders is to estimate a propensity model that corrects for the relationship between covariates and assigned treatment. Unfortunately, the confounding variables themselves are not always observed, in which case we can only bound the propensity, and therefore bound the magnitude of causal effects. In many important cases, like administering a dose of some medicine, the possible treatments belong to a continuum. Sensitivity models, which are required to tie the true propensity to something that can be estimated, have been explored for binary treatments. We propose one for continuous treatments. We develop a framework to compute ignorance intervals on the partially identified dose-response curves, enabling us to quantify the susceptibility of an inference to hidden confounders. We show with real-world observational studies that our approach can give non-trivial bounds on causal effects from continuous treatments in the presence of hidden confounders. Link » Myrl Marmarelis · Greg Ver Steeg · Neda Jahanshad · Aram Galstyan 🔗 - Local Causal Discovery for Estimating Causal Effects ( Poster )  link » Even when the causal graph underlying our data is unknown, we can nevertheless narrow down the possible values that an average treatment effect (ATE) can take by (1) identifying the graph up to a Markov equivalence class; and (2) estimating that ATE for each graph in the class. While the PC algorithm can identify this class under strong faithfulness assumptions, it can be computationally prohibitive. Fortunately, only the local graph structure around the treatment is required to identify an ATE, a fact exploited by local discovery algorithms to identify the possible values for an ATE more efficiently. In this paper, we introduce Local Discovery using Eager Collider Checks (LDECC), a new local discovery algorithm that finds colliders and orients the treatment's parents differently from existing methods. We show that there exist graphs where our algorithm exponentially outperforms existing local discovery algorithms and vice versa. Moreover, we show that LDECC and existing algorithms rely on different sets of faithfulness assumptions. We leverage this insight to show that it is possible to test and recover from certain faithfulness violations. Link » Shantanu Gupta · David Childers · Zachary Lipton 🔗 - Partial identification without distributional assumptions ( Poster )  link » Causal effect estimation is important for numerous tasks in the natural and social sciences. However, identifying effects is impossible from observational data without making strong, often untestable assumptions which might not be applicable to real-world data. We consider algorithms for the partial identification problem, bounding the effects of multivariate, continuous treatments over multiple possible causal models when unmeasured confounding makes identification impossible. Even in the partial identification setting, most current work is only applicable in the discrete setting. We propose a framework which is applicable to continuous high-dimensional data. The observable evidence is matched to the implications of constraints encoded in a causal model by norm-based criteria. In particular, for the IV setting, we present ways by which such constrained optimization problems can be parameterized without likelihood functions for the causal or the observed data model, reducing the computational and statistical complexity of the task. Link » Kirtan Padh · Jakob Zeitler · David Watson · Matt Kusner · Ricardo Silva · Niki Kilbertus 🔗 - Trust Your $\nabla$: Gradient-based Intervention Targeting for Causal Discovery ( Poster )  link » Inferring causal structure from data is a challenging task of fundamental importance in science. Observational data are often insufficient to identify a system’s causal structure uniquely. While conducting interventions (i.e., experiments) can improve the identifiability, such samples are usually challenging and expensive to obtain. Hence, experimental design approaches for causal discovery aim to minimize the number of interventions by estimating the most informative intervention target. In this work, we propose a novel gradient-based intervention targeting method, abbreviated GIT, that 'trusts' the gradient estimator of a gradient-based causal discovery framework to provide signals for intervention acquisition function. We provide extensive experiments in simulated and real-world datasets and demonstrate that GIT performs on par with competitive baselines, surpassing them in the low-data regime. Link » Mateusz Olko · Michał Zając · Aleksandra Nowak · Nino Scherrer · Yashas Annadani · Stefan Bauer · Łukasz Kuciński · Piotr Miłoś 🔗 - A Novel Two-level Causal Inference Framework for On-road Vehicle Quality Issues Diagnosis ( Poster )  link » In the automotive industry, the full cycle of managing in-use vehicle quality issues can take weeks to investigate. The process involves isolating root causes, defining and implementing appropriate treatments, and refining treatments if needed. The main pain-point is the lack of a systematic method to identify causal relationships, evaluate treatment effectiveness, and direct the next actionable treatment if the current treatment was deemed ineffective. This paper will show how we leverage causal Machine Learning (ML) to speed up such processes. A real-word data set collected from on-road vehicles will be used to demonstrate the proposed framework. Open challenges for vehicle quality applications will also be discussed. Link » Qian Wang · Huanyi Shui · Thi Tu Trinh Tran · Milad nezhad · devesh upadhyay · Kamran Paynabar · Anqi He 🔗 - A kernel balancing approach that scales to big data ( Poster )  link » In causal inference, weighting is commonly used for covariate adjustment. Procedurally, weighting can be accomplished either through methods that model the propensity score, or methods that use convex optimization to find the weights that balance the covariates directly. However, the computational demand of the balancing approach has to date precluded it from including broad classes of functions of the covariates in large datasets. To address this problem, we outline a scalable approach to balancing that incorporates a kernel representation of a broad class of basis functions. First, we use the Nystr\"{o}m method to rapidly generate a kernel basis in a reproducing kernel Hilbert space containing a broad class of basis functions of the covariates. Then, we integrate these basis functions as constraints in a state-of-the-art implementation of the alternating direction method of multipliers, which rapidly finds the optimal weights that balance the general basis functions in the kernel. Using this kernel balancing approach, we conduct a national observational study of the relationship between hospital profit status and treatment and outcomes of heart attack care in a large dataset containing 1.27 million patients and over 3,500 hospitals. After weighting, we observe that for-profit hospitals perform percutaneous coronary intervention at similar rates as other hospitals; however, their patients have slightly worse mortality and higher readmission rates. Link » Kwangho Kim · Bijan Niknam · Jose Zubizarreta 🔗 - Causal Bandits: Online Decision-Making in Endogenous Settings ( Poster )  link » The deployment of Multi-Armed Bandits (MAB) has become commonplace in many economic applications. However, regret guarantees for even state-of-the-art linear bandit algorithms (such as Optimism in the Face of Uncertainty Linear bandit (OFUL)) make strong exogeneity assumptions w.r.t. arm covariates. This assumption is very often violated in many economic contexts and using such algorithms can lead to sub-optimal decisions. In this paper, we consider the problem of online learning in linear stochastic multi-armed bandit problems with endogenous covariates. We propose an algorithm we term BanditIV, that uses instrumental variables to correct for this bias, and prove an $\tilde{\mathcal{O}}(k\sqrt{T})$ upper bound for the expected regret of the algorithm. Further, in economic contexts, it is also important to understand how the model parameters behave asymptotically. To this end, we additionally propose $\epsilon$-\textit{BanditIV} algorithm and demonstrate its asymptotic consistency and normality while ensuring the same regret bound. Finally, we carry out extensive Monte Carlo simulations to demonstrate the performance of our algorithms compared to other methods. We show that BanditIV and $\epsilon$-BanditIV significantly outperform other existing methods. Link » Jingwen Zhang · Yifang Chen · Amandeep Singh 🔗 - Rethinking Neural Relational Inference for Granger Causal Discovery ( Poster )  link » Granger causal discovery aims to infer the underlying Granger causal relationships between pairs of variables in a multivariate time series system. Recent work has proposed using Neural Relational Inference (NRI) -- a latent graph inference model -- for Granger causal discovery. However, the conditions under which NRI succeeds in recovering the true Granger causal graph remain unknown. In this work we show how the mean field approximation inherent in NRI has significant implications for its ability to recover the Granger causal structure in multivariate time series. We illustrate this point theoretically and experimentally using a linear vector autoregressive model -- an important benchmark in economic and financial studies. Link » Stefanos Bennett · Rose Yu 🔗 - Machine learning reveals how personalized climate communication can both succeed and backfire ( Poster )  link » Different advertising messages work for different people. Machine learning can be an effective way to personalise climate communications. In this paper, we use machine learning to reanalyse findings from a recent study, showing that online advertisements increased climate change belief in some people while resulting in decreased belief in others. In particular, we show that the effect of the advertisements could change depending on a person's age and ethnicity. Our findings have broad methodological and practical applications. Link » Totte Harinen · Alexandre Filipowicz · Shabnam Hakimi · Rumen Iliev · Matt Klenk · Emily Sumner 🔗 - Causal Reasoning in the Presence of Latent Confounders via Neural ADMG Learning ( Poster )  link » Latent confounding has been a long-standing obstacle for causal reasoning from observational data. One popular approach is to model the data using acyclic directed mixed graphs (ADMGs), which describe ancestral relations between variables using directed and bidirected edges. However, existing methods using ADMGs are based on either linear functional assumptions or a discrete search that is complicated to use and lacks computational tractability for large datasets. In this work, we further extend the existing body of work and develop a novel gradient-based approach to learning an ADMG with nonlinear functional relations from observational data. We first show that the presence of latent confounding is identifiable under the assumptions of bow-free ADMGs with nonlinear additive noise models. With this insight, we propose a novel neural causal model based on autoregressive flows. This not only enables us to model complex causal relationships behind the data, but also estimate their functional relationships (hence treatment effects) simultaneously. We further validate our approach via experiments on both synthetic and real-world datasets, and demonstrate the competitive performance against relevant baselines. Link » Matthew Ashman · Chao Ma · Agrin Hilmkil · Joel Jennings · Cheng Zhang 🔗 - Causal Discovery using Marginal Likelihood ( Poster )  link » Causal discovery is an important problem in many fields such as medicine, epidemiology, or economics. Here, causal structure is necessary to relay information about the effectiveness of treatments. Recently, causal structure has also been linked with generalisation and out of distribution generalisation in prediction tasks. This problem however, is only solvable upto a Markov equivalence class without strong assumptions. Previous work has made assumptions on the data generation process to render the causal graph identifiable. These methods fail when the data generation assumptions no longer hold. In this work, we directly algorithmise the independence of causal mechanism (ICM) assumption to achieve a flexible causal discovery algorithm. In the bivariate case, this is done by showing that independent parametrisation with independent priors encodes an ICM assumption. We show that this implies different marginal likelihoods for models of different causal directions. Using a Bayesian model selection procedure to take advantage of this, we show that our method outperforms competing methods. Link » Anish Dhir · Mark van der Wilk 🔗 - Deep Structural Causal Modelling of the Clinical and Radiological Phenotype of Alzheimer’s Disease ( Poster )  link » Alzheimer's disease (AD) has a poorly understood aetiology. Patients often have different rates and patterns of brain atrophy, and present at different stages along the natural history of their condition. This means that establishing the relationships between disease-related variables, and subsequently linking the clinical and radiological phenotypes of AD is difficult. Investigating this link is important because it could ultimately allow for a better understanding of the disease process, and this could enable tasks such as treatment effect estimates, disease progression modelling, and better precision medicine for AD patients. We extend a class of deep structural causal models (DSCMs) to the clinical and radiological phenotype of AD, and propose an aetiological model of relevant patient demographics, imaging and clinical biomarkers, and cognitive assessment/educational scores based on specific current hypotheses in the medical literature. The trained DSCM produces biologically plausible counterfactuals relating to the specified disease covariates, and reproduces ground-truth longitudinal changes in magnetic resonance images of AD. Such a model could enable the assessment of the effects of intervening on variables outside a randomized controlled trial setting. In addition, by being explicit about how causal relationships are encoded, the framework provides a principled approach to define and assess hypotheses of the aetiology of AD. Code to replicate the experiment can be found at: $\href{https://github.com/aay993/counterfactual_AD}{Counterfactual AD.}$ Link » Ahmed Abdulaal · Daniel C. Castro · Daniel Alexander 🔗 - Learning Causal Representations of Single Cells via Sparse Mechanism Shift Modeling ( Poster )  link » Latent variable models have become a go-to tool for analyzing biological data, especially in the field of single-cell genomics. One remaining challenge is the identification of individual latent variables related to biological pathways, more generally conceptualized as disentanglement. Although versions of variational autoencoders that explicitly promote disentanglement were introduced and applied to single-cell genomics data, the theoretical feasibility of disentanglement from independent and identically distributed measurements has been challenged.Recent methods propose instead to leverage non-stationary data, as well as the sparse mechanism assumption in order to learn disentangled representations, with a causal semantic. Here, we explore the application of these methodological advances in the analysis of single-cell genomics data with genetic or chemical perturbations. We benchmark these methods on simulated single cell expression data to evaluate their performance regarding disentanglement, causal target identification and out-of-domain generalisation. Finally, by applying the approaches to a large-scale gene perturbation dataset, we find that the model relying on the sparse mechanism shift hypothesis surpasses contemporary methods on a transfer learning task. Link » Romain Lopez · Nataša Tagasovska · Stephen Ra · Kyunghyun Cho · Jonathan Pritchard · Aviv Regev 🔗 - Amortized Inference for Causal Structure Learning ( Poster )  link » Learning causal structure poses a combinatorial search problem that typically involves evaluating structures with a score or independence test. The resulting search is costly, and designing suitable scores or tests that capture prior knowledge is difficult. In this work, we propose to amortize causal structure learning. Rather than searching over structures, we train a variational inference model to predict the causal structure from observational or interventional data. This allows us to bypass both the search over graphs and the hand-engineering of suitable score functions. Instead, our inference model acquires domain-specific inductive biases for causal discovery solely from data generated by a simulator. The architecture of our inference model emulates permutation invariances that are crucial for statistical efficiency in structure learning, which facilitates generalization to significantly larger problem instances than seen during training. On synthetic data and semisynthetic gene expression data, our models exhibit robust generalization capabilities when subject to substantial distribution shifts and significantly outperform existing algorithms, especially in the challenging genomics domain. Link » Lars Lorch · Scott Sussex · Jonas Rothfuss · Andreas Krause · Bernhard Schölkopf 🔗 - Discrete Learning Of DAGs Via Backpropagation ( Poster )  link » Recently continuous relaxations have been proposed in order to learn directed acyclic graphs (DAGs) by backpropagation, instead of combinatorial optimization. However, a number of techniques for fully discrete backpropagation could instead be applied. In this paper, we explore this direction and propose DAG-DB, a framework for learning DAGs by Discrete Backpropagation, based on the architecture of Implicit Maximum Likelihood Estimation (I-MLE). DAG-DB performs competitively using either of two fully discrete backpropagation techniques, I-MLE itself, or straight-through estimation. Link » Andrew Wren · Pasquale Minervini · Luca Franceschi · Valentina Zantedeschi 🔗 - Interventional Causal Representation Learning ( Poster )  link » The theory of identifiable representation learning aims to build general-purpose methods that extract high-level latent (causal) factors from low-level sensory data. Most existing works focus on identifiable representation learning with observational data, relying on distributional assumptions on latent (causal) factors. However, in practice, we often also have access to interventional data for representation learning. How can we leverage interventional data to help identify high-level latents? To this end, we explore the role of interventional data for identifiable representation learning in this work. We study the identifiability of latent causal factors with and without interventional data, under minimal distributional assumptions on the latents. We prove that, if the true latent variables map to the observed high-dimensional data via a polynomial function, then representation learning via minimizing the standard reconstruction loss of autoencoders identifies the true latents up to affine transformation. If we further have access to interventional data generated by hard do interventions on some of the latents, then we can identify these intervened latents up to permutation, shift and scaling. Link » Kartik Ahuja · Yixin Wang · Divyat Mahajan · Yoshua Bengio 🔗 - Exploiting Neighborhood Interference with Low Order Interactions under Unit Randomized Design ( Poster )  link » Network interference, where the outcome of an individual is affected by the treatment of others in their social network, is pervasive in real-world settings. However, it poses a challenge to estimating causal effects. We consider the task of estimating the total treatment effect (TTE), or the difference between the average outcomes of the population when everyone is treated versus when no one is, under network interference. Under a non-uniform Bernoulli randomized design, we utilize knowledge of the network structure to provide an unbiased estimator for the TTE when network interference effects are constrained to low-order interactions among neighbors of an individual. We make no assumptions on the graph other than bounded degree, allowing for well-connected networks that may not be easily clustered. We derive a bound on the variance of our estimator and show in simulated experiments that it performs well compared with standard TTE estimators. Link » Mayleen Cortez · Matthew Eichhorn · Christina Yu 🔗 - Synthetic Principle Component Design: Fast Covariate Balancing with Synthetic Controls ( Poster )  link » In this paper, we target at developing a globally convergent and yet practically tractable optimization algorithm for the optimal experimental design problem with synthetic controls. Specifically, we consider a setting when the pre-treatment outcome data is available. the average treatment effect is estimated via the difference between the weighted average outcomes of the treated and control units, where the weights are learned from the data observed during the pre-treatment periods. We find that if the experimenter has the ability to select an optimal set of non-negative weights, the optimal experimental design problem is identical to to a so-called \textit{phase synchronization} problem. We solve this problem via a normalized variate of the generalized power method with spectral initialization. On the theoretical side, we establish the first global optimality guarantee for experiment design under a realizable assumption with linear fixed-effect models (also referred to an "interactive fixed-effect model"). These results are surprising, given that the optimal design of experiments, especially involving covariate matching, typically involves solving an NP-hard combinatorial optimization problem. Empirically, we apply our algorithm on US Bureau of Labor Statistics and the Abadie-Diemond-Hainmueller California Smoking Data. The experiments demonstrate that our algorithm surpasses the random design with a large margin in terms of the root mean square error. Link » Yiping Lu · Jiajin Li · Lexing Ying · Jose Blanchet 🔗 - Investigating causal understanding in LLMs ( Poster )  link » We investigate the quality of causal world models of LLMs in very simple settings. We test whether LLMs can identify cause and effect in natural language settings (taken from BigBench) such as “My car got dirty. I washed the car. Question: Which sentence is the cause of the other?” and in multiple other toy settings. We probe the LLM's world model by changing the presentation of the prompt while keeping the meaning constant, e.g. by changing the order of the sentences or asking the opposite question. Additionally, we test if the model can be “tricked” into giving wrong answers when we present the shot in a different pattern than the prompt. We have three findings. Firstly, larger models yield better results. Secondly, k-shot outperforms one-shot and one-shot outperforms zero-shot in standard conditions. Thirdly, LLMs perform worse in conditions where form and content differ. We conclude that the form of the presentation matters for LLM predictions or, in other words, that LLMs don't solely base their predictions on content. Finally, we detail some of the implications this research has on AI safety. Link » Marius Hobbhahn · Tom Lieberum · David Seiler 🔗 - A Large-Scale Observational Study of the Causal Effects of a Behavioral Health Nudge ( Poster )  link » The Apple Watch encourages users to stand throughout the day by delivering a notification onto the users’ wrist if they have been sitting for the first 50 minutes of an hour. This simple behavioral intervention exemplifies the classical definition of nudge as a choice architecture that alters behavior without forbidding options or significantly changing economic incentives. In order to estimate from observational data the causal effect of the notification on the user's standing probability throughout the day, we introduce a novel regression discontinuity design for time series data with time-varying treatment. Using over 76 billions minutes of private and anonymous observational standing data from more than 160,000 subjects enrolled in the public Apple Heart and Movement Study from 2019 to 2022, we show that the nudge increases the probability of standing by up to 49.5% across all the studied population. The nudge is similarly effective for participants self-identified as male or female, and it is more effective in older people, increasing the standing probability in people over 75 years old by more than 60%. We also demonstrate that closing Apple Watch Activity Rings, another simple choice architecture that visualizes the participant's daily progress in Move, Exercise, and Stand, correlates with user's response to the intervention; for users who close their activity rings regularly, the standing nudge almost triples their probability of standing. This observational study, which is one of the largest of its kind exploring the causal effects of nudges in the general population, demonstrates the effectiveness of simple behavioral health interventions and introduces a novel application of regression discontinuity design extended here to time-varying treatments. Link » Achille Nazaret · Guillermo Sapiro 🔗 - Variational Causal Inference ( Poster )  link » Estimating an individual's potential outcomes under counterfactual treatments is a challenging task for traditional causal inference and supervised learning approaches when the outcome is high-dimensional (e.g. gene expressions, impulse responses, human faces) and covariates are relatively limited. In this case, to construct one's outcome under a counterfactual treatment, it is crucial to leverage individual information contained in its observed factual outcome on top of the covariates. We propose a deep variational Bayesian framework that rigorously integrates two main sources of information for outcome construction under a counterfactual treatment: one source is the individual features embedded in the high-dimensional factual outcome; the other source is the response distribution of similar subjects (subjects with the same covariates) that factually received this treatment of interest. Link » Yulun Wu · Layne Price · Zichen Wang · Vassilis Ioannidis · Rob Barton · George Karypis 🔗

Author Information

Cheng Zhang (Microsoft Research, Cambridge, UK)

Cheng Zhang is a principal researcher at Microsoft Research Cambridge, UK. She leads the Data Efficient Decision Making (Project Azua) team in Microsoft. Before joining Microsoft, she was with the statistical machine learning group of Disney Research Pittsburgh, located at Carnegie Mellon University. She received her Ph.D. from the KTH Royal Institute of Technology. She is interested in advancing machine learning methods, including variational inference, deep generative models, and sequential decision-making under uncertainty; and adapting machine learning to social impactful applications such as education and healthcare. She co-organized the Symposium on Advances in Approximate Bayesian Inference from 2017 to 2019.