Feature importance estimates that inform users about the degree to which given inputs influence the output of a predictive model are crucial for understanding, validating, and interpreting machine-learning models. However, providing fast and accurate estimates of feature importance for high-dimensional data, and quantifying the uncertainty of such estimates remain open challenges. Here, we frame the task of providing explanations for the decisions of machine-learning models as a causal learning task, and train causal explanation (CXPlain) models that learn to estimate to what degree certain inputs cause outputs in another machine-learning model. CXPlain can, once trained, be used to explain the target model in little time, and enables the quantification of the uncertainty associated with its feature importance estimates via bootstrap ensembling. We present experiments that demonstrate that CXPlain is significantly more accurate and faster than existing model-agnostic methods for estimating feature importance. In addition, we confirm that the uncertainty estimates provided by CXPlain ensembles are strongly correlated with their ability to accurately estimate feature importance on held-out data.
Patrick Schwab (ETH Zurich / Roche)
Walter Karlen (ETH Zurich)
More from the Same Authors
2021 : Learning Neural Causal Models with Active Interventions »
Nino Scherrer · Olexa Bilaniuk · Yashas Annadani · Anirudh Goyal ALIAS PARTH GOYAL · Patrick Schwab · Bernhard Schölkopf · Michael Mozer · Yoshua Bengio · Stefan Bauer · Nan Rosemary Ke