Timezone: »
Poster
IA-RED$^2$: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan · Rameswar Panda · Yifan Jiang · Zhangyang Wang · Rogerio Feris · Aude Oliva
The self-attention-based model, transformer, is recently becoming the leading backbone in the field of computer vision. In spite of the impressive success made by transformers in a variety of vision tasks, it still suffers from heavy computation and intensive memory costs. To address this limitation, this paper presents an Interpretability-Aware REDundancy REDuction framework (IA-RED$^2$). We start by observing a large amount of redundant computation, mainly spent on uncorrelated input patches, and then introduce an interpretable module to dynamically and gracefully drop these redundant patches. This novel framework is then extended to a hierarchical structure, where uncorrelated tokens at different stages are gradually removed, resulting in a considerable shrinkage of computational cost. We include extensive experiments on both image and video tasks, where our method could deliver up to 1.4x speed-up for state-of-the-art models like DeiT and TimeSformer, by only sacrificing less than 0.7% accuracy. More importantly, contrary to other acceleration approaches, our method is inherently interpretable with substantial visual evidence, making vision transformer closer to a more human-understandable architecture while being lighter. We demonstrate that the interpretability that naturally emerged in our framework can outperform the raw attention learned by the original visual transformer, as well as those generated by off-the-shelf interpretation methods, with both qualitative and quantitative results. Project Page: http://people.csail.mit.edu/bpan/ia-red/.
Author Information
Bowen Pan (Massachusetts Institute of Technology)
Rameswar Panda (MIT-IBM Watson AI Lab)
Yifan Jiang (The University of Texas at Austin)
Zhangyang Wang (UT Austin)
Rogerio Feris (MIT-IBM Watson AI Lab, IBM Research)
Aude Oliva (Massachusetts Institute of Technology)
More from the Same Authors
-
2021 : Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting »
Benjamin Wilson · William Qi · Tanmay Agarwal · John Lambert · Jagjeet Singh · Siddhesh Khandelwal · Bowen Pan · Ratnesh Kumar · Andrew Hartnett · Jhony Kaesemodel Pontes · Deva Ramanan · Peter Carr · James Hays -
2021 : Select, Label, and Mix: Learning Discriminative Invariant Feature Representations for Partial Domain Adaptation »
Aadarsh Sahoo · Rameswar Panda · Rogerio Feris · Kate Saenko · Abir Das -
2022 Poster: Procedural Image Programs for Representation Learning »
Manel Baradad · Richard Chen · Jonas Wulff · Tongzhou Wang · Rogerio Feris · Antonio Torralba · Phillip Isola -
2022 Poster: How Transferable are Video Representations Based on Synthetic Data? »
Yo-whan Kim · Samarth Mishra · SouYoung Jin · Rameswar Panda · Hilde Kuehne · Leonid Karlinsky · Venkatesh Saligrama · Kate Saenko · Aude Oliva · Rogerio Feris -
2022 Poster: FETA: Towards Specializing Foundational Models for Expert Task Applications »
Amit Alfassy · Assaf Arbelle · Oshri Halimi · Sivan Harary · Roei Herzig · Eli Schwartz · Rameswar Panda · Michele Dolfi · Christoph Auer · Peter Staar · Kate Saenko · Rogerio Feris · Leonid Karlinsky -
2021 Poster: Improving Contrastive Learning on Imbalanced Data via Open-World Sampling »
Ziyu Jiang · Tianlong Chen · Ting Chen · Zhangyang Wang -
2021 Poster: Sparse Training via Boosting Pruning Plasticity with Neuroregeneration »
Shiwei Liu · Tianlong Chen · Xiaohan Chen · Zahra Atashgahi · Lu Yin · Huanyu Kou · Li Shen · Mykola Pechenizkiy · Zhangyang Wang · Decebal Constantin Mocanu -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: Dynamic Distillation Network for Cross-Domain Few-Shot Recognition with Unlabeled Data »
Ashraful Islam · Chun-Fu (Richard) Chen · Rameswar Panda · Leonid Karlinsky · Rogerio Feris · Richard J. Radke -
2021 Poster: Hyperparameter Tuning is All You Need for LISTA »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2021 Poster: Chasing Sparsity in Vision Transformers: An End-to-End Exploration »
Tianlong Chen · Yu Cheng · Zhe Gan · Lu Yuan · Lei Zhang · Zhangyang Wang -
2021 Poster: Data-Efficient GAN Training Beyond (Just) Augmentations: A Lottery Ticket Perspective »
Tianlong Chen · Yu Cheng · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: TransGAN: Two Pure Transformers Can Make One Strong GAN, and That Can Scale Up »
Yifan Jiang · Shiyu Chang · Zhangyang Wang -
2021 Poster: AugMax: Adversarial Composition of Random Augmentations for Robust Training »
Haotao Wang · Chaowei Xiao · Jean Kossaifi · Zhiding Yu · Anima Anandkumar · Zhangyang Wang -
2021 Poster: Delayed Propagation Transformer: A Universal Computation Engine towards Practical Control in Cyber-Physical Systems »
Wenqing Zheng · Qiangqiang Guo · Hao Yang · Peihao Wang · Zhangyang Wang -
2021 Poster: The Elastic Lottery Ticket Hypothesis »
Xiaohan Chen · Yu Cheng · Shuohang Wang · Zhe Gan · Jingjing Liu · Zhangyang Wang -
2021 Poster: Sanity Checks for Lottery Tickets: Does Your Winning Ticket Really Win the Jackpot? »
Xiaolong Ma · Geng Yuan · Xuan Shen · Tianlong Chen · Xuxi Chen · Xiaohan Chen · Ning Liu · Minghai Qin · Sijia Liu · Zhangyang Wang · Yanzhi Wang -
2021 Poster: Contrast and Mix: Temporal Contrastive Video Domain Adaptation with Background Mixing »
Aadarsh Sahoo · Rutav Shah · Rameswar Panda · Kate Saenko · Abir Das -
2021 Poster: You are caught stealing my winning lottery ticket! Making a lottery ticket claim its ownership »
Xuxi Chen · Tianlong Chen · Zhenyu Zhang · Zhangyang Wang -
2020 Poster: AdaShare: Learning What To Share For Efficient Deep Multi-Task Learning »
Ximeng Sun · Rameswar Panda · Rogerio Feris · Kate Saenko -
2019 : Adaptive Multi-Task Neural Networks for Efficient Inference »
Rogerio Feris -
2019 Workshop: AI for Humanitarian Assistance and Disaster Response »
Ritwik Gupta · Robin Murphy · Trevor Darrell · Eric Heim · Zhangyang Wang · Bryce Goodman · Piotr BiliĆski -
2019 Poster: E2-Train: Training State-of-the-art CNNs with Over 80% Less Energy »
Ziyu Jiang · Yue Wang · Xiaohan Chen · Pengfei Xu · Yang Zhao · Yingyan Lin · Zhangyang Wang -
2019 Poster: Learning to Optimize in Swarms »
Yue Cao · Tianlong Chen · Zhangyang Wang · Yang Shen -
2019 Poster: Model Compression with Adversarial Robustness: A Unified Optimization Framework »
Shupeng Gui · Haotao Wang · Haichuan Yang · Chen Yu · Zhangyang Wang · Ji Liu -
2018 Poster: Can We Gain More from Orthogonality Regularizations in Training Deep Networks? »
Nitin Bansal · Xiaohan Chen · Zhangyang Wang -
2018 Poster: Delta-encoder: an effective sample synthesis method for few-shot object recognition »
Eli Schwartz · Leonid Karlinsky · Joseph Shtok · Sivan Harary · Mattias Marder · Abhishek Kumar · Rogerio Feris · Raja Giryes · Alex Bronstein -
2018 Spotlight: Delta-encoder: an effective sample synthesis method for few-shot object recognition »
Eli Schwartz · Leonid Karlinsky · Joseph Shtok · Sivan Harary · Mattias Marder · Abhishek Kumar · Rogerio Feris · Raja Giryes · Alex Bronstein -
2018 Poster: Dialog-based Interactive Image Retrieval »
Xiaoxiao Guo · Hui Wu · Yu Cheng · Steven Rennie · Gerald Tesauro · Rogerio Feris -
2018 Poster: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2018 Poster: Co-regularized Alignment for Unsupervised Domain Adaptation »
Abhishek Kumar · Prasanna Sattigeri · Kahini Wadhawan · Leonid Karlinsky · Rogerio Feris · Bill Freeman · Gregory Wornell -
2018 Spotlight: Theoretical Linear Convergence of Unfolded ISTA and Its Practical Weights and Thresholds »
Xiaohan Chen · Jialin Liu · Zhangyang Wang · Wotao Yin -
2012 Poster: Modeling the Forgetting Process using Image Regions »
Aditya Khosla · Jianxiong Xiao · Antonio Torralba · Aude Oliva