Timezone: »
High model performance, on average, can hide that models may systematically underperform on subgroups of the data. We consider the tabular setting, which surfaces the unique issue of outcome heterogeneity - this is prevalent in areas such as healthcare, where patients with similar features can have different outcomes, thus making reliable predictions challenging. To tackle this, we propose Data-IQ, a framework to systematically stratify examples into subgroups with respect to their outcomes. We do this by analyzing the behavior of individual examples during training, based on their predictive confidence and, importantly, the aleatoric (data) uncertainty. Capturing the aleatoric uncertainty permits a principled characterization and then subsequent stratification of data examples into three distinct subgroups (Easy, Ambiguous, Hard). We experimentally demonstrate the benefits of Data-IQ on four real-world medical datasets. We show that Data-IQ's characterization of examples is most robust to variation across similarly performant (yet different models), compared to baselines. Since Data-IQ can be used with any ML model (including neural networks, gradient boosting etc.), this property ensures consistency of data characterization, while allowing flexible model selection. Taking this a step further, we demonstrate that the subgroups enable us to construct new approaches to both feature acquisition and dataset selection. Furthermore, we highlight how the subgroups can inform reliable model usage, noting the significant impact of the Ambiguous subgroup on model generalization.
Author Information
Nabeel Seedat (University of Cambridge)
Jonathan Crabbé (University of Cambridge)
Ioana Bica (DeepMind)
Mihaela van der Schaar (University of Cambridge)
More from the Same Authors
-
2021 Spotlight: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Spotlight: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 : Really Doing Great at Estimating CATE? A Critical Look at ML Benchmarking Practices in Treatment Effect Estimation »
Alicia Curth · David Svensson · Jim Weatherall · Mihaela van der Schaar -
2021 : The Medkit-Learn(ing) Environment: Medical Decision Modelling through Simulation »
Alex Chan · Ioana Bica · Alihan Hüyük · Daniel Jarrett · Mihaela van der Schaar -
2022 : Adaptively Identifying Patient Populations With Treatment Benefit in Clinical Trials »
Alicia Curth · Alihan Hüyük · Mihaela van der Schaar -
2022 : D-CIPHER: Discovery of Closed-form Partial Differential Equations »
Krzysztof Kacprzyk · Zhaozhi Qian · Mihaela van der Schaar -
2022 : Practical Approaches for Fair Learning with Multitype and Multivariate Sensitive Attributes »
Tennison Liu · Alex Chan · Boris van Breugel · Mihaela van der Schaar -
2022 : Closing Remarks »
Cheng Zhang · Mihaela van der Schaar -
2022 : Panel Discussion »
Cheng Zhang · Mihaela van der Schaar · Ilya Shpitser · Aapo Hyvarinen · Yoshua Bengio · Bernhard Schölkopf -
2022 Workshop: Causal Machine Learning for Real-World Impact »
Nick Pawlowski · Jeroen Berrevoets · Caroline Uhler · Kun Zhang · Mihaela van der Schaar · Cheng Zhang -
2022 : Opening Remarks »
Cheng Zhang · Mihaela van der Schaar -
2022 Workshop: Synthetic Data for Empowering ML Research »
Mihaela van der Schaar · Zhaozhi Qian · Sergul Aydore · Dimitris Vlitas · Dino Oglic · Tucker Balch -
2022 Poster: Concept Activation Regions: A Generalized Framework For Concept-Based Explanations »
Jonathan Crabbé · Mihaela van der Schaar -
2022 Poster: Online Decision Mediation »
Daniel Jarrett · Alihan Hüyük · Mihaela van der Schaar -
2022 Poster: Benchmarking Heterogeneous Treatment Effect Models through the Lens of Interpretability »
Jonathan Crabbé · Alicia Curth · Ioana Bica · Mihaela van der Schaar -
2022 Poster: Transfer Learning on Heterogeneous Feature Spaces for Treatment Effects Estimation »
Ioana Bica · Mihaela van der Schaar -
2022 Poster: Composite Feature Selection Using Deep Ensembles »
Fergus Imrie · Alexander Norcliffe · Pietro Lió · Mihaela van der Schaar -
2022 Poster: Synthetic Model Combination: An Instance-wise Approach to Unsupervised Ensemble Learning »
Alex Chan · Mihaela van der Schaar -
2021 : Invited talk 8 »
Mihaela van der Schaar -
2021 : Invited talk #5: Mihaela van der Schaar »
Mihaela van der Schaar -
2021 : Mihaela Van Der Schaar Q&A »
Mihaela van der Schaar -
2021 : Mihaela Van Der Schaar »
Mihaela van der Schaar -
2021 Poster: Invariant Causal Imitation Learning for Generalizable Policies »
Ioana Bica · Daniel Jarrett · Mihaela van der Schaar -
2021 Poster: Explaining Latent Representations with a Corpus of Examples »
Jonathan Crabbe · Zhaozhi Qian · Fergus Imrie · Mihaela van der Schaar -
2021 Poster: Time-series Generation by Contrastive Imitation »
Daniel Jarrett · Ioana Bica · Mihaela van der Schaar -
2021 Poster: Closing the loop in medical decision support by understanding clinical decision-making: A case study on organ transplantation »
Yuchao Qin · Fergus Imrie · Alihan Hüyük · Daniel Jarrett · alexander gimson · Mihaela van der Schaar -
2021 Poster: DECAF: Generating Fair Synthetic Data Using Causally-Aware Generative Networks »
Boris van Breugel · Trent Kyono · Jeroen Berrevoets · Mihaela van der Schaar -
2021 Poster: MIRACLE: Causally-Aware Imputation via Learning Missing Data Mechanisms »
Trent Kyono · Yao Zhang · Alexis Bellot · Mihaela van der Schaar -
2021 Poster: Conformal Time-series Forecasting »
Kamile Stankeviciute · Ahmed M. Alaa · Mihaela van der Schaar -
2021 Poster: Integrating Expert ODEs into Neural ODEs: Pharmacology and Disease Progression »
Zhaozhi Qian · William Zame · Lucas Fleuren · Paul Elbers · Mihaela van der Schaar -
2021 Poster: SurvITE: Learning Heterogeneous Treatment Effects from Time-to-Event Data »
Alicia Curth · Changhee Lee · Mihaela van der Schaar -
2021 Poster: On Inductive Biases for Heterogeneous Treatment Effect Estimation »
Alicia Curth · Mihaela van der Schaar -
2021 Poster: SyncTwin: Treatment Effect Estimation with Longitudinal Outcomes »
Zhaozhi Qian · Yao Zhang · Ioana Bica · Angela Wood · Mihaela van der Schaar -
2021 Poster: Estimating Multi-cause Treatment Effects via Single-cause Perturbation »
Zhaozhi Qian · Alicia Curth · Mihaela van der Schaar -
2020 Poster: Robust Recursive Partitioning for Heterogeneous Treatment Effects with Uncertainty Quantification »
Hyun-Suk Lee · Yao Zhang · William Zame · Cong Shen · Jang-Won Lee · Mihaela van der Schaar -
2020 Poster: Learning outside the Black-Box: The pursuit of interpretable models »
Jonathan Crabbe · Yao Zhang · William Zame · Mihaela van der Schaar -
2020 Poster: Strictly Batch Imitation Learning by Energy-based Distribution Matching »
Daniel Jarrett · Ioana Bica · Mihaela van der Schaar -
2020 Poster: Estimating the Effects of Continuous-valued Interventions using Generative Adversarial Networks »
Ioana Bica · James Jordon · Mihaela van der Schaar -
2020 Poster: Gradient Regularized V-Learning for Dynamic Treatment Regimes »
Yao Zhang · Mihaela van der Schaar -
2020 Poster: OrganITE: Optimal transplant donor organ offering using an individual treatment effect »
Jeroen Berrevoets · James Jordon · Ioana Bica · alexander gimson · Mihaela van der Schaar -
2020 : Q&A for invited speaker, Mihaela van der Schaar »
Mihaela van der Schaar -
2020 : Interpretable AutoML: Powering the machine learning revolution in healthcare in the era of Covid-19 and beyond »
Mihaela van der Schaar -
2020 Poster: CASTLE: Regularization via Auxiliary Causal Graph Discovery »
Trent Kyono · Yao Zhang · Mihaela van der Schaar -
2020 Poster: VIME: Extending the Success of Self- and Semi-supervised Learning to Tabular Domain »
Jinsung Yoon · Yao Zhang · James Jordon · Mihaela van der Schaar -
2020 Poster: When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes »
Zhaozhi Qian · Ahmed Alaa · Mihaela van der Schaar -
2020 Oral: When and How to Lift the Lockdown? Global COVID-19 Scenario Analysis and Policy Assessment using Compartmental Gaussian Processes »
Zhaozhi Qian · Ahmed Alaa · Mihaela van der Schaar -
2019 : Poster session »
Sebastian Farquhar · Erik Daxberger · Andreas Look · Matt Benatan · Ruiyi Zhang · Marton Havasi · Fredrik Gustafsson · James A Brofos · Nabeel Seedat · Micha Livne · Ivan Ustyuzhaninov · Adam Cobb · Felix D McGregor · Patrick McClure · Tim R. Davidson · Gaurush Hiranandani · Sanjeev Arora · Masha Itkina · Didrik Nielsen · William Harvey · Matias Valdenegro-Toro · Stefano Peluchetti · Riccardo Moriconi · Tianyu Cui · Vaclav Smidl · Taylan Cemgil · Jack Fitzsimons · He Zhao · · mariana vargas vieyra · Apratim Bhattacharyya · Rahul Sharma · Geoffroy Dubourg-Felonneau · Jonathan Warrell · Slava Voloshynovskiy · Mihaela Rosca · Jiaming Song · Andrew Ross · Homa Fashandi · Ruiqi Gao · Hooshmand Shokri Razaghi · Joshua Chang · Zhenzhong Xiao · Vanessa Boehm · Giorgio Giannone · Ranganath Krishnan · Joe Davison · Arsenii Ashukha · Jeremiah Liu · Sicong (Sheldon) Huang · Evgenii Nikishin · Sunho Park · Nilesh Ahuja · Mahesh Subedar · · Artyom Gadetsky · Jhosimar Arias Figueroa · Tim G. J. Rudner · Waseem Aslam · Adrián Csiszárik · John Moberg · Ali Hebbal · Kathrin Grosse · Pekka Marttinen · Bang An · Hlynur Jónsson · Samuel Kessler · Abhishek Kumar · Mikhail Figurnov · Omesh Tickoo · Steindor Saemundsson · Ari Heljakka · Dániel Varga · Niklas Heim · Simone Rossi · Max Laves · Waseem Gharbieh · Nicholas Roberts · Luis Armando Pérez Rey · Matthew Willetts · Prithvijit Chakrabarty · Sumedh Ghaisas · Carl Shneider · Wray Buntine · Kamil Adamczewski · Xavier Gitiaux · Suwen Lin · Hao Fu · Gunnar Rätsch · Aidan Gomez · Erik Bodin · Dinh Phung · Lennart Svensson · Juliano Tusi Amaral Laganá Pinto · Milad Alizadeh · Jianzhun Du · Kevin Murphy · Beatrix Benkő · Shashaank Vattikuti · Jonathan Gordon · Christopher Kanan · Sontje Ihler · Darin Graham · Michael Teng · Louis Kirsch · Tomas Pevny · Taras Holotyak -
2019 Poster: Attentive State-Space Modeling of Disease Progression »
Ahmed Alaa · Mihaela van der Schaar -
2019 Poster: Demystifying Black-box Models with Symbolic Metamodels »
Ahmed Alaa · Mihaela van der Schaar -
2019 Poster: Time-series Generative Adversarial Networks »
Jinsung Yoon · Daniel Jarrett · Mihaela van der Schaar -
2019 Poster: Differentially Private Bagging: Improved utility and cheaper privacy than subsample-and-aggregate »
James Jordon · Jinsung Yoon · Mihaela van der Schaar -
2019 Poster: Conditional Independence Testing using Generative Adversarial Networks »
Alexis Bellot · Mihaela van der Schaar -
2019 Spotlight: Conditional Independence Testing using Generative Adversarial Networks »
Alexis Bellot · Mihaela van der Schaar -
2018 Poster: Multitask Boosting for Survival Analysis with Competing Risks »
Alexis Bellot · Mihaela van der Schaar -
2018 Poster: Forecasting Treatment Responses Over Time Using Recurrent Marginal Structural Networks »
Bryan Lim · Ahmed M. Alaa · Mihaela van der Schaar -
2017 : Coffee break and Poster Session II »
Mohamed Kane · Albert Haque · Vagelis Papalexakis · John Guibas · Peter Li · Carlos Arias · Eric Nalisnick · Padhraic Smyth · Frank Rudzicz · Xia Zhu · Theodore Willke · Noemie Elhadad · Hans Raffauf · Harini Suresh · Paroma Varma · Yisong Yue · Ognjen (Oggi) Rudovic · Luca Foschini · Syed Rameel Ahmad · Hasham ul Haq · Valerio Maggio · Giuseppe Jurman · Sonali Parbhoo · Pouya Bashivan · Jyoti Islam · Mirco Musolesi · Chris Wu · Alexander Ratner · Jared Dunnmon · Cristóbal Esteban · Aram Galstyan · Greg Ver Steeg · Hrant Khachatrian · Marc Górriz · Mihaela van der Schaar · Anton Nemchenko · Manasi Patwardhan · Tanay Tandon -
2017 Poster: DPSCREEN: Dynamic Personalized Screening »
Kartik Ahuja · William Zame · Mihaela van der Schaar -
2017 Poster: Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks »
Ahmed M. Alaa · Mihaela van der Schaar -
2017 Spotlight: Deep Multi-task Gaussian Processes for Survival Analysis with Competing Risks »
Ahmed M. Alaa · Mihaela van der Schaar -
2017 Poster: Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes »
Ahmed M. Alaa · Mihaela van der Schaar -
2016 Poster: Balancing Suspense and Surprise: Timely Decision Making with Endogenous Information Acquisition »
Ahmed M. Alaa · Mihaela van der Schaar -
2016 Poster: A Non-parametric Learning Method for Confidently Estimating Patient's Clinical State and Dynamics »
William Hoiles · Mihaela van der Schaar -
2014 Poster: Discovering, Learning and Exploiting Relevance »
Cem Tekin · Mihaela van der Schaar