Workshop

NIPS 2013 Workshop on Causality: Large-scale Experiment Design and Inference of Causal Mechanisms

Isabelle Guyon · Leon Bottou · Bernhard Schölkopf · Alexander Statnikov · Evelyne Viegas · james m robins

Project Page

Abstract

The goal of this workshop is to discuss new methods of large scale experiment design and their application to the inference of causal mechanisms and promote their evaluation via a series of challenges. Emphasis will be put on capitalizing on massive amounts of available observational data to cut down the number of experiments needed, pseudo- or quasi-experiments, iterative designs, and the on-line acquisition of data with minimal perturbation of the system under study. The participants of the cause-effect pairs challenge http://www.causality.inf.ethz.ch/cause-effect.php will be encouraged to submit papers.

The problem of attributing causes to effects is pervasive in science, medicine, economy and almost every aspects of our everyday life involving human reasoning and decision making. What affects your health? the economy? climate changes? The gold standard to establish causal relationships is to perform randomized controlled experiments. However, experiments are costly while non-experimental "observational" data collected routinely around the world are readily available. Unraveling potential cause-effect relationships from such observational data could save a lot of time and effort by allowing us to prioritize confirmatory experiments. This could be complemented by new strategies of incremental experimental design combining observational and experimental data.

Much of machine learning has been so far concentrating on analyzing data already collected, rather than collecting data. While experimental design is a well-developed discipline of statistics, data collection practitioners often neglect to apply its principled methods. As a result, data collected and made available to data analysts, in charge of explaining them and building predictive or causal models, are not always of good quality and are plagued by experimental artifacts. In reaction to this situation, some researchers in machine learning have started to become interested in experimental design to close the gap between data acquisition or experimentation and model building. In parallel, researchers in causal studies have started raising the awareness of the differences between passive observations, active sampling, and interventions. In this domain, only interventions qualify as true experiments capable of unraveling cause-effect relationships

This workshop will discuss methods of experimental design, which involve machine learning in the process of data collection. Experiments require intervening on the system under study, which is usually expensive and sometimes unethical or impossible. Changing the course of the planets to study the tides is impossible, forcing people to smoke to study the influence of smoking on health is unethical, modifying the placement of ads on web pages to optimize revenue may be expensive. In the latter case, recent methods proposed by Léon Bottou and others involve minimally perturbating the process with small random interventions to collect interventional data around the operating point and extrapolate to estimate the effect of various interventions. Presently, there is a profusion of other algorithms being proposed, mostly evaluated on toy problems. One of the main challenges in causal learning consists in developing strategies for an objective evaluation. This includes, for instance, methods how to acquire large representative data sets with known ground truth. This, in turn, raises the question to what extent the regularities observed in these data sets also apply to the relevant data sets where the causal structure is unknown because data sets with known ground truth may not be representative.

As part of an on-going effort of benchmarking causal discovery methods, we organized a new challenge [March 28 - September 2, 2013] whose purpose is to devise a "coefficient of causation": given samples of a pair of variables, compute a coefficient between -Inf and +Inf, large positive values indicating that A causes B, small negative values that B causes A and values near zero indicating no causal relationship.
We provided hundreds of pairs of real variables with known causal relationships from domains as diverse as chemistry, climatology, ecology, economy, engineering, epidemiology, genomics, medicine, physics. and sociology. Those are intermixed with controls (pairs of independent variables and pairs of variables that are dependent but not causally related) and semi-artificial cause-effect pairs (real variables mixed in various ways to produce a given outcome). This challenge is limited to pairs of variables deprived of their context. Thus constraint-based methods relying on conditional independence tests and/or graphical models are not applicable. The goal is to push the state-of-the art in complementary methods, which can eventually disambiguate Markov equivalence classes.
We are also planning to run in October-November 2013 a second edition of the cause-effect pairs challenge dedicated to attract students who want to learn about the problem and build on top of the best challenge submission. This event will be sponsored in part by Microsoft and serve to beta test CodaLab a new machine learning experimentation platform, which will be launched in 2014.

Part of the workshop will be devoted to discuss the results of the challenge and to plan for future events, which may include a causality in time series challenge and a series of challenge on experimental design in which the participants can conduct virtual experiments on artificial systems. The workshop will bring together researchers in machine learning and statistics and application domains including computational biology and econometrics.

Video

Chat is not available.