Timezone: »

A Benchmark for Interpretability Methods in Deep Neural Networks
Sara Hooker · Dumitru Erhan · Pieter-Jan Kindermans · Been Kim

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #159

We propose an empirical measure of the approximate accuracy of feature importance estimates in deep neural networks. Our results across several large-scale image classification datasets show that many popular interpretability methods produce estimates of feature importance that are not better than a random designation of feature importance. Only certain ensemble based approaches---VarGrad and SmoothGrad-Squared---outperform such a random assignment of importance. The manner of ensembling remains critical, we show that some approaches do no better then the underlying method but carry a far higher computational burden.

Author Information

Sara Hooker (Google Brain)

I lead Cohere For AI, a non-profit research lab that seeks to solve complex machine learning problems. We support fundamental research that explores the unknown, and are focused on creating more points of entry into machine learning research. Prior to Cohere, I was a research scientist Google Brain doing work on training models that go beyond test-set accuracy to fulfill multiple desired criteria -- interpretable, compact, fair and robust. I enjoy working on research problems where progress translates to reliable and accessible machine learning in the real-world. My research interests include algorithm transparency, security and privacy.

Dumitru Erhan (Google Brain)
Pieter-Jan Kindermans (Google Brain)
Been Kim (Google)

More from the Same Authors