Timezone: »

[IT5] Natural language descriptions of deep features
Jacob Andreas

Tue Dec 14 12:37 PM -- 01:21 PM (PST) @ None

Despite major efforts in recent years to improve explainability of deep neural networks, the tools we use for communicating explanations have largely remained the same: visualizations of representative inputs, salient input regions, and local model approximations. But when humans describe complex decision rules, we often use a different explanatory tool: natural language. I'll describe recent work on explaining models for computer vision tasks by automatically constructing natural language descriptions of individual neurons. These descriptions ground prediction in meaningful perceptual and linguistic abstractions, and can be used to surface unexpected model behaviors, and identify and mitigate adversarial vulnerabilities. These results show that fine-grained, automatic annotation of deep network models is both possible and practical: rich, language-based explanations produced by automated annotation procedures can surface meaningful and actionable information about deep networks.

Author Information

Jacob Andreas (MIT)

More from the Same Authors