Skip to yearly menu bar Skip to main content

Workshop: Attributing Model Behavior at Scale (ATTRIB)

On the Support Vector Effect in DNNs: Rethinking Last Layer Sensitivity-based Instance Attribution

Syed Hasan Amin Mahmood · Rajiv Khanna


As complex predictive models gain popularity, the need for effective explanation techniques has also increased. A line of research is dedicated to instance attribution, which attempts to select training samples that the model capitalized on to make a given test prediction. Many existing methods employing sensitivity-based techniques have been shown to be unreliable on large deep networks, and are often costly during runtime. We rigorously uncover SVM-like behavior in DNNs, which we term the support vector effect (SVE). We use SVE to analyze the limitations of sensitivity-based instance attribution methods, revealing their propensity to behave as class-level methods rather than fulfilling their intended role as instance-level ones. We thus advocate for reconsidering similarity-based methods, and propose a simple yet profoundly effective alternative: using prediction itself as explanation.

Chat is not available.