Timezone: »
The machine learning (ML) toolbox for estimation of heterogeneous treatment effects from observational data is expanding rapidly, yet many of its algorithms have been evaluated only on a very limited set of semi-synthetic benchmark datasets. In this paper, we investigate current benchmarking practices for ML-based conditional average treatment effect (CATE) estimators, with special focus on empirical evaluation based on the popular semi-synthetic IHDP benchmark. We identify problems with current practice and highlight that semi-synthetic benchmark datasets, which (unlike real-world benchmarks used elsewhere in ML) do not necessarily reflect properties of real data, can systematically favor some algorithms over others -- a fact that is rarely acknowledged but of immense relevance for interpretation of empirical results. Further, we argue that current evaluation metrics evaluate performance only for a small subset of possible use cases of CATE estimators, and discuss alternative metrics relevant for applications in personalized medicine. Additionally, we discuss alternatives for current benchmark datasets, and implications of our findings for benchmarking in CATE estimation.
Author Information
Alicia Curth (University of Cambridge)
David Svensson (Chalmers University)
Jim Weatherall (AstraZeneca)
Mihaela van der Schaar (University of Cambridge)
More from the Same Authors
-
2021 Spotlight: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Spotlight: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 : The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation »
Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar -
2022 : Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials »
Alicia Curth · Alihan Hüyük · Mihaela van der Schaar -
2022 : Causal ML for medicines R&D »
Jim Weatherall -
2022 Poster: Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability »
Jonathan Crabbé · Alicia Curth · Ioana Bica · Mihaela van der Schaar -
2021 Poster: Invariant Causal Imitation Learning for Generalizable Policies »
Ioana Bica · Daniel Jarrett · Mihaela van der Schaar -
2021 Poster: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 Poster: Time-series Generation by Contrastive Imitation »
Daniel Jarrett · Ioana Bica · Mihaela van der Schaar -
2021 Poster: Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation »
Yuchao Qin · Fergus Imrie · Alihan Hüyük · Daniel Jarrett · alexander gimson · Mihaela van der Schaar -
2021 Poster: DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks »
Boris van Breugel · Trent Kyono · Jeroen Berrevoets · Mihaela van der Schaar -
2021 Poster: MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms »
Trent Kyono · Yao Zhang · Alexis Bellot · Mihaela van der Schaar -
2021 Poster: Conformal Time-series Forecasting »
Kamile Stankeviciute · Ahmed M. Alaa · Mihaela van der Schaar -
2021 Poster: Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression »
Zhaozhi Qian · William Zame · Lucas Fleuren · Paul Elbers · Mihaela van der Schaar -
2021 Poster: SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data »
Alicia Curth · Changhee Lee · Mihaela van der Schaar -
2021 Poster: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Poster: SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes »
Zhaozhi Qian · Yao Zhang · Ioana Bica · Angela Wood · Mihaela van der Schaar -
2021 Poster: Estimating Multi-cause Treatment Effects via Single-cause Perturbation »
Zhaozhi Qian · Alicia Curth · Mihaela van der Schaar -
2019 Poster: Time-series Generative Adversarial Networks »
Jinsung Yoon · Daniel Jarrett · Mihaela van der Schaar -
2016 Poster: Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition »
Ahmed M. Alaa · Mihaela van der Schaar -
2016 Poster: A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics »
William Hoiles · Mihaela van der Schaar -
2014 Poster: Discovering, Learning and Exploiting Relevance »
Cem Tekin · Mihaela van der Schaar