Timezone: »
Feature importance (FI) estimates are a popular form of explanation, and they are commonly created and evaluated by computing the change in model confidence caused by removing certain input features at test time. For example, in the standard Sufficiency metric, only the top-k most important tokens are kept. In this paper, we study several under-explored dimensions of FI explanations, providing conceptual and empirical improvements for this form of explanation. First, we advance a new argument for why it can be problematic to remove features from an input when creating or evaluating explanations: the fact that these counterfactual inputs are out-of-distribution (OOD) to models implies that the resulting explanations are socially misaligned. The crux of the problem is that the model prior and random weight initialization influence the explanations (and explanation metrics) in unintended ways. To resolve this issue, we propose a simple alteration to the model training process, which results in more socially aligned explanations and metrics. Second, we compare among five approaches for removing features from model inputs. We find that some methods produce more OOD counterfactuals than others, and we make recommendations for selecting a feature-replacement function. Finally, we introduce four search-based methods for identifying FI explanations and compare them to strong baselines, including LIME, Anchors, and Integrated Gradients. Through experiments with six diverse text classification datasets, we find that the only method that consistently outperforms random search is a Parallel Local Search (PLS) that we introduce. Improvements over the second best method are as large as 5.4 points for Sufficiency and 17 points for Comprehensiveness.
Author Information
Peter Hase (University of North Carolina, Chapel Hill)
Harry Xie (Department of Computer Science, University of North Carolina, Chapel Hill)
Mohit Bansal (UNC Chapel Hill)
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2022 : LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning »
Yi-Lin Sung · Jaemin Cho · Mohit Bansal -
2023 Poster: Visual Programming for Step-by-Step Text-to-Image Generation and Evaluation »
Jaemin Cho · Abhay Zala · Mohit Bansal -
2023 Poster: Resolving Interference When Merging Models »
Prateek Yadav · Derek Tam · Leshem Choshen · Colin Raffel · Mohit Bansal -
2023 Poster: PanoGen: Text-Conditioned Panoramic Environment Generation for Vision-and-Language Navigation »
Jialu Li · Mohit Bansal -
2023 Poster: Self-Chained Image-Language Model for Video Localization and Question Answering »
Shoubin Yu · Jaemin Cho · Prateek Yadav · Mohit Bansal -
2023 Poster: Paxion: Patching Action Knowledge in Video-Language Foundation Models »
Zhenhailong Wang · Ansel Blume · Sha Li · Genglin Liu · Jaemin Cho · Zineng Tang · Mohit Bansal · Heng Ji -
2023 Poster: Can Language Models Teach? Teacher Explanations Improve Student Performance via Theory of Mind »
Swarnadeep Saha · Peter Hase · Mohit Bansal -
2023 Poster: Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models »
Peter Hase · Mohit Bansal · Been Kim · Asma Ghandeharioun -
2023 Poster: Adaptive Contextual Perception: How To Generalize To New Backgrounds and Ambiguous Objects »
Zhuofan Ying · Peter Hase · Mohit Bansal -
2023 Poster: Any-to-Any Generation via Composable Diffusion »
Zineng Tang · Ziyi Yang · Chenguang Zhu · Michael Zeng · Mohit Bansal -
2022 Poster: TVLT: Textless Vision-Language Transformer »
Zineng Tang · Jaemin Cho · Yixin Nie · Mohit Bansal -
2022 Poster: Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners »
Zhenhailong Wang · Manling Li · Ruochen Xu · Luowei Zhou · Jie Lei · Xudong Lin · Shuohang Wang · Ziyi Yang · Chenguang Zhu · Derek Hoiem · Shih-Fu Chang · Mohit Bansal · Heng Ji -
2022 Poster: LST: Ladder Side-Tuning for Parameter and Memory Efficient Transfer Learning »
Yi-Lin Sung · Jaemin Cho · Mohit Bansal -
2022 Poster: Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning »
Haokun Liu · Derek Tam · Mohammed Muqeeth · Jay Mohta · Tenghao Huang · Mohit Bansal · Colin Raffel -
2022 Poster: VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives »
Zhuofan Ying · Peter Hase · Mohit Bansal -
2022 Poster: WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language Models »
Yonatan Bitton · Nitzan Bitton Guetta · Ron Yosef · Yuval Elovici · Mohit Bansal · Gabriel Stanovsky · Roy Schwartz -
2021 Poster: VidLanKD: Improving Language Understanding via Video-Distilled Knowledge Transfer »
Zineng Tang · Jaemin Cho · Hao Tan · Mohit Bansal -
2021 Poster: Detecting Moments and Highlights in Videos via Natural Language Queries »
Jie Lei · Tamara L Berg · Mohit Bansal -
2020 Workshop: HAMLETS: Human And Model in the Loop Evaluation and Training Strategies »
Divyansh Kaushik · Bhargavi Paranjape · Forough Arabshahi · Yanai Elazar · Yixin Nie · Max Bartolo · Polina Kirichenko · Pontus Lars Erik Saito Stenetorp · Mohit Bansal · Zachary Lipton · Douwe Kiela -
2017 Demonstration: Interactive-Length Multi-Task Video Captioning with Cooperative Feedback »
Han Guo · Ramakanth Pasunuru · Mohit Bansal