Timezone: »

Regularizing Black-box Models for Improved Interpretability
Gregory Plumb · Maruan Al-Shedivat · Ángel Alexander Cabrera · Adam Perer · Eric Xing · Ameet Talwalkar

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #1078

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable. Our method, ExpO, is a hybridization of these approaches that regularizes a model for explanation quality at training time. Importantly, these regularizers are differentiable, model agnostic, and require no domain knowledge to define. We demonstrate that post-hoc explanations for ExpO-regularized models have better explanation quality, as measured by the common fidelity and stability metrics. We verify that improving these metrics leads to significantly more useful explanations with a user study on a realistic task.

Author Information

Gregory Plumb (Carnegie Mellon University)
Maruan Al-Shedivat (Carnegie Mellon University)
Ángel Alexander Cabrera (Carnegie Mellon University)
Adam Perer (Carnegie Mellon University)
Eric Xing (Petuum Inc. / Carnegie Mellon University)
Ameet Talwalkar (CMU)

More from the Same Authors