Timezone: »

 
Towards trustworthy explanations with gradient-based attribution methods
Ethan Labelson · Rohit Tripathy · Peter Koo
Event URL: https://openreview.net/forum?id=LGgo0wPM2MF »

The low interpretability of deep neural networks (DNNs) remains a key barrier to their wide-spread adoption in the sciences. Attribution methods offer a promising solution, providing feature importance scores that serve as first-order model explanations for a given input. In practice, gradient-based attribution methods, such as saliency maps, can yield noisy importance scores depending on model architecture and training procedure. Here we explore how various regularization techniques affect model explanations with saliency maps using synthetic regulatory genomic data, which allows us to quantitatively assess the efficacy of attribution maps. Strikingly, we find that generalization performance does not imply better saliency explanations; though unlike before, we do not observe a clear tradeoff. Interestingly, we find that conventional regularization strategies, when tuned appropriately, can yield high generalization and interpretability performance, similar to what can be achieved with more sophisticated techniques, such as manifold mixup. Our work challenges the conventional knowledge that model selection should be based on test performance; another criterion is needed to sub-select models ideally suited for downstream post hoc interpretability for scientific discovery.

Author Information

Ethan Labelson (Cold Spring Harbor Laboratory)
Rohit Tripathy (Cold Spring Harbor Laboratory)
Peter Koo (Cold Spring Harbor Laboratory)

I am an Assistant Professor in the Simons Center for Quantitative Biology at Cold Spring Harbor Laboratory. My lab's research explores how representations of biological sequences learned by deep neural networks can provide novel insights into biological processes.

More from the Same Authors