Timezone: »
How to learn a good predictor on data with missing values? Most efforts focus on first imputing as well as possible and second learning on the completed data to predict the outcome. Yet, this widespread practice has no theoretical grounding. Here we show that for almost all imputation functions, an impute-then-regress procedure with a powerful learner is Bayes optimal. This result holds for all missing-values mechanisms, in contrast with the classic statistical results that require missing-at-random settings to use imputation in probabilistic modeling. Moreover, it implies that perfect conditional imputation is not needed for good prediction asymptotically. In fact, we show that on perfectly imputed data the best regression function will generally be discontinuous, which makes it hard to learn. Crafting instead the imputation so as to leave the regression function unchanged simply shifts the problem to learning discontinuous imputations. Rather, we suggest that it is easier to learn imputation and regression jointly. We propose such a procedure, adapting NeuMiss, a neural network capturing the conditional links across observed and unobserved variables whatever the missing-value pattern. Our experiments confirm that joint imputation and regression through NeuMiss is better than various two step procedures in a finite-sample regime.
Author Information
Marine Le Morvan (INRIA)
Julie Josse (INRIA/CMAP)
Erwan Scornet (Ecole Polytechnique)
Gael Varoquaux (INRIA)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: What’s a good imputation to predict with missing values? »
Dates n/a. Room
More from the Same Authors
-
2021 : AI as statistical methods for imperfect theories »
Gael Varoquaux -
2022 Poster: Why do tree-based models still outperform deep learning on typical tabular data? »
Leo Grinsztajn · Edouard Oyallon · Gael Varoquaux -
2020 Poster: Estimation and Imputation in Probabilistic Principal Component Analysis with Missing Not At Random Data »
Aude Sportisse · Claire Boyer · Julie Josse -
2020 Poster: Debiasing Averaged Stochastic Gradient Descent to handle missing values »
Aude Sportisse · Claire Boyer · Aymeric Dieuleveut · Julie Josse -
2020 Poster: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux -
2020 Oral: NeuMiss networks: differentiable programming for supervised learning with missing values. »
Marine Le Morvan · Julie Josse · Thomas Moreau · Erwan Scornet · Gael Varoquaux -
2020 Session: Orals & Spotlights Track 19: Probabilistic/Causality »
Julie Josse · Jasper Snoek -
2019 Poster: Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing »
Meyer Scetbon · Gael Varoquaux -
2019 Spotlight: Comparing distributions: $\ell_1$ geometry improves kernel two-sample testing »
Meyer Scetbon · Gael Varoquaux -
2019 Poster: Manifold-regression to predict from MEG/EEG brain signals without source modeling »
David Sabbagh · Pierre Ablin · Gael Varoquaux · Alexandre Gramfort · Denis A. Engemann -
2017 : Scikit-learn & nilearn: Democratisation of machine learning for brain imaging (INRIA) »
Gael Varoquaux -
2017 : Invited Talk: "Tales from fMRI: Learning from limited labeled data" »
Gael Varoquaux -
2017 Poster: Universal consistency and minimax rates for online Mondrian Forests »
Jaouad Mourtada · Stéphane Gaïffas · Erwan Scornet -
2017 Poster: Learning Neural Representations of Human Cognition across Many fMRI Studies »
Arthur Mensch · Julien Mairal · Danilo Bzdok · Bertrand Thirion · Gael Varoquaux -
2016 Poster: Learning brain regions via large-scale online structured sparse dictionary learning »
Elvis DOHMATOB · Arthur Mensch · Gael Varoquaux · Bertrand Thirion -
2015 Poster: Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data »
Danilo Bzdok · Michael Eickenberg · Olivier Grisel · Bertrand Thirion · Gael Varoquaux -
2013 Poster: Mapping paradigm ontologies to and from the brain »
Yannick Schwartz · Bertrand Thirion · Gael Varoquaux