Timezone: »
Biomarker development is increasingly focusing on heterogeneous sources of data including brain images, biological samples and social data. Biobanks give access to tens of thousands of brain images and other social and biomedical data. These large-scale datasets make it possible to model biomedical outcomes using machine learning. To interpret predictive models, it is crucial to understand how input features influence the prediction. Over the past decades, a wide range of methods has been developed for ranking variables according to their importance in predictive models. Given the variety of settings (e.g. dimensionality or non-linearities, classification vs regression) it remains unclear which method provides the most accurate feature rankings. Benchmarks have been conducted for multiple methods using simulations and empirical validation, yet, these efforts have been disconnected so far because of the diversity of research settings. As a result, some of the most popular methods for estimating variable importance have never been compared. In this work, we extend the literature by systematically comparing the most popular methods for linear and non-linear inputs in classification and regression tasks. For methods providing assessment of statistical significance, we checked if the p-values are well calibrated. We confronted performance metrics with computation time. Deep Neural Networks (DNN) were most reliable at ranking variables according to their importance. SHAP values did not provide reliable population-level importance scores, whereas BART and MDI provided a reasonable tradeoff between computation time and reliability while not providing statistical guarantees. Marginal selection, knockoffs and d0CRT did not generalize well when data were non-linear or correlated. Applied to biomarker learning, DNN and BART provided overall similar importance rankings. Our results emphasize the importance of systematic empirical benchmarks across applied contexts.
Author Information
Ahmad CHAMMA (Inria)
Denis A. Engemann (INRIA Saclay)
Bertrand Thirion (INRIA)
More from the Same Authors
-
2022 Poster: A Conditional Randomization Test for Sparse Logistic Regression in High-Dimension »
Binh T. Nguyen · Bertrand Thirion · Sylvain Arlot -
2022 Poster: Aligning individual brains with fused unbalanced Gromov Wasserstein »
Alexis Thual · Quang Huy TRAN · Tatiana Zemskova · Nicolas Courty · Rémi Flamary · Stanislas Dehaene · Bertrand Thirion -
2021 : Session 3 Oral 2 »
Ahmad CHAMMA -
2021 Poster: Shared Independent Component Analysis for Multi-Subject Neuroimaging »
Hugo Richard · Pierre Ablin · Bertrand Thirion · Alexandre Gramfort · Aapo Hyvarinen -
2020 Poster: Modeling Shared responses in Neuroimaging Studies through MultiView ICA »
Hugo Richard · Luigi Gresele · Aapo Hyvarinen · Bertrand Thirion · Alexandre Gramfort · Pierre Ablin -
2020 Spotlight: Modeling Shared responses in Neuroimaging Studies through MultiView ICA »
Hugo Richard · Luigi Gresele · Aapo Hyvarinen · Bertrand Thirion · Alexandre Gramfort · Pierre Ablin -
2020 Poster: Statistical control for spatio-temporal MEG/EEG source imaging with desparsified mutli-task Lasso »
Jerome-Alexis Chevalier · Joseph Salmon · Alexandre Gramfort · Bertrand Thirion -
2019 Poster: Manifold-regression to predict from MEG/EEG brain signals without source modeling »
David Sabbagh · Pierre Ablin · Gael Varoquaux · Alexandre Gramfort · Denis A. Engemann -
2017 Poster: Learning Neural Representations of Human Cognition across Many fMRI Studies »
Arthur Mensch · Julien Mairal · Danilo Bzdok · Bertrand Thirion · Gael Varoquaux -
2016 Poster: Learning brain regions via large-scale online structured sparse dictionary learning »
Elvis DOHMATOB · Arthur Mensch · Gael Varoquaux · Bertrand Thirion -
2015 Poster: Semi-Supervised Factored Logistic Regression for High-Dimensional Neuroimaging Data »
Danilo Bzdok · Michael Eickenberg · Olivier Grisel · Bertrand Thirion · Gael Varoquaux -
2013 Poster: Mapping paradigm ontologies to and from the brain »
Yannick Schwartz · Bertrand Thirion · Gael Varoquaux -
2011 Workshop: Machine Learning and Interpretation in Neuroimaging (MLINI-2011) »
Melissa K Carroll · Guillermo Cecchi · Kai-min K Chang · Moritz Grosse-Wentrup · James Haxby · Georg Langs · Anna Korhonen · Bjoern Menze · Brian Murphy · Janaina Mourao-Miranda · Vittorio Murino · Francisco Pereira · Irina Rish · Mert Sabuncu · Irina Simanova · Bertrand Thirion -
2010 Poster: Brain covariance selection: better individual functional connectivity models using population prior »
Gaël Varoquaux · Alexandre Gramfort · Jean-Baptiste Poline · Bertrand Thirion -
2009 Poster: Discriminative Network Models of Schizophrenia »
Guillermo Cecchi · Irina Rish · Benjamin Thyreau · Bertrand Thirion · Marion Plaze · Jean-Luc Martinot · Marie Laure Paillere-Martinot · Jean-Baptiste Poline -
2009 Oral: Discriminative Network Models of Schizophrenia »
Guillermo Cecchi · Irina Rish · Benjamin Thyreau · Bertrand Thirion · Marion Plaze · Jean-Luc Martinot · Marie Laure Paillere-Martinot · Jean-Baptiste Poline