Timezone: »
Vision transformers (ViT) have demonstrated impressive performance across numerous machine vision tasks. These models are based on multi-head self-attention mechanisms that can flexibly attend to a sequence of image patches to encode contextual cues. An important question is how such flexibility (in attending image-wide context conditioned on a given patch) can facilitate handling nuisances in natural images e.g., severe occlusions, domain shifts, spatial permutations, adversarial and natural perturbations. We systematically study this question via an extensive set of experiments encompassing three ViT families and provide comparisons with a high-performing convolutional neural network (CNN). We show and analyze the following intriguing properties of ViT: (a)Transformers are highly robust to severe occlusions, perturbations and domain shifts, e.g., retain as high as 60% top-1 accuracy on ImageNet even after randomly occluding 80% of the image content. (b)The robustness towards occlusions is not due to texture bias, instead we show that ViTs are significantly less biased towards local textures, compared to CNNs. When properly trained to encode shape-based features, ViTs demonstrate shape recognition capability comparable to that of human visual system, previously unmatched in the literature. (c)Using ViTs to encode shape representation leads to an interesting consequence of accurate semantic segmentation without pixel-level supervision. (d)Off-the-shelf features from a single ViT model can be combined to create a feature ensemble, leading to high accuracy rates across a range of classification datasets in both traditional and few-shot learning paradigms. We show effective features of ViTs are due to flexible and dynamic receptive fields possible via self-attention mechanisms. Our code will be publicly released.
Author Information
Muhammad Muzammal Naseer (Australian National University)
Kanchana Ranasinghe (State University of New York, Stony Brook)
Salman H Khan (Inception Institute of Artificial Intelligence)
Munawar Hayat (IIAI)
Fahad Shahbaz Khan (Inception Institute of Artificial Intelligence)
Ming-Hsuan Yang (Google / UC Merced)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Intriguing Properties of Vision Transformers »
Fri. Dec 10th 04:30 -- 06:00 PM Room Virtual
More from the Same Authors
-
2022 Poster: An Investigation into Whitening Loss for Self-supervised Learning »
Xi Weng · Lei Huang · Lei Zhao · Rao Anwer · Salman Khan · Fahad Shahbaz Khan -
2023 Poster: ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image Collections »
Chun-Han Yao · Amit Raj · Wei-Chih Hung · Michael Rubinstein · Yuanzhen Li · Ming-Hsuan Yang · Varun Jampani -
2023 Poster: Language-based Action Concept Spaces for Video Self-Supervised Learning »
Kanchana Ranasinghe · Michael Ryoo -
2023 Poster: A Tale of Two Features: Stable Diffusion Complements DINO for Zero-Shot Semantic Correspondence »
Junyi Zhang · Charles Herrmann · Junhwa Hur · Luisa Polania Cabrera · Varun Jampani · Deqing Sun · Ming-Hsuan Yang -
2023 Poster: AIMS: All-Inclusive Multi-Level Segmentation »
Lu Qi · Jason Kuen · Weidong Guo · Jiuxiang Gu · Zhe Lin · Bo Du · Yu Xu · Ming-Hsuan Yang -
2023 Poster: Diffusion-SS3D: Diffusion Model for Semi-supervised 3D Object Detection »
Cheng-Ju Ho · Chen-Hsuan Tai · Yen-Yu Lin · Ming-Hsuan Yang · Yi-Hsuan Tsai -
2023 Poster: PromptIR: Prompting for All-in-One Image Restoration »
Vaishnav Potlapalli · Syed Waqas Zamir · Salman Khan · Fahad Shahbaz Khan -
2023 Poster: Module-wise Adaptive Distillation for Multimodality Foundation Models »
Chen Liang · Jiahui Yu · Ming-Hsuan Yang · Matthew Brown · Yin Cui · Tuo Zhao · Boqing Gong · Tianyi Zhou -
2023 Poster: Align Your Prompts: Test-Time Prompting with Distribution Alignment for Zero-Shot Generalization »
Jameel Abdul Samadh · Mohammad Hanan Gani · Noor Hussein · Muhammad Uzair Khattak · Muzammal Naseer · Salman Khan · Fahad Shahbaz Khan -
2023 Poster: SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs »
Lijun Yu · Yong Cheng · Zhiruo Wang · Vivek Kumar · Wolfgang Macherey · Yanping Huang · David Ross · Irfan Essa · Yonatan Bisk · Ming-Hsuan Yang · Kevin Murphy · Alexander Hauptmann · Lu Jiang -
2023 Poster: 3D Indoor Instance Segmentation in an Open-World »
Mohamed Boudjoghra · Salwa Al Khatib · Jean Lahoud · Hisham Cholakkal · Rao Anwer · Salman Khan · Fahad Shahbaz Khan -
2023 Poster: Cal-DETR: Calibrated Detection Transformer »
Muhammad Akhtar Munir · Salman Khan · Muhammad Haris Khan · Mohsen Ali · Fahad Shahbaz Khan -
2023 Poster: Video Timeline Modeling For News Story Understanding »
Meng Liu · Mingda Zhang · Jialu Liu · Hanjun Dai · Ming-Hsuan Yang · Shuiwang Ji · Zheyun Feng · Boqing Gong -
2022 Workshop: Vision Transformers: Theory and applications »
Fahad Shahbaz Khan · Gul Varol · Salman Khan · Ping Luo · Rao Anwer · Ashish Vaswani · Hisham Cholakkal · Niki Parmar · Joost van de Weijer · Mubarak Shah -
2022 Spotlight: Lightning Talks 1B-3 »
Chaofei Wang · Qixun Wang · Jing Xu · Long-Kai Huang · Xi Weng · Fei Ye · Harsh Rangwani · shrinivas ramasubramanian · Yifei Wang · Qisen Yang · Xu Luo · Lei Huang · Adrian G. Bors · Ying Wei · Xinglin Pan · Sho Takemori · Hong Zhu · Rui Huang · Lei Zhao · Yisen Wang · Kato Takashi · Shiji Song · Yanan Li · Rao Anwer · Yuhei Umeda · Salman Khan · Gao Huang · Wenjie Pei · Fahad Shahbaz Khan · Venkatesh Babu R · Zenglin Xu -
2022 Spotlight: An Investigation into Whitening Loss for Self-supervised Learning »
Xi Weng · Lei Huang · Lei Zhao · Rao Anwer · Salman Khan · Fahad Shahbaz Khan -
2022 Poster: LASSIE: Learning Articulated Shapes from Sparse Image Ensemble via 3D Part Discovery »
Chun-Han Yao · Wei-Chih Hung · Yuanzhen Li · Michael Rubinstein · Ming-Hsuan Yang · Varun Jampani -
2022 Poster: Bridging the Gap between Object and Image-level Representations for Open-Vocabulary Detection »
Hanoona Bangalath · Muhammad Maaz · Muhammad Uzair Khattak · Salman Khan · Fahad Shahbaz Khan -
2021 Poster: Learning 3D Dense Correspondence via Canonical Point Autoencoder »
An-Chieh Cheng · Xueting Li · Min Sun · Ming-Hsuan Yang · Sifei Liu -
2021 Poster: Exploring Cross-Video and Cross-Modality Signals for Weakly-Supervised Audio-Visual Video Parsing »
Yan-Bo Lin · Hung-Yu Tseng · Hsin-Ying Lee · Yen-Yu Lin · Ming-Hsuan Yang -
2021 Poster: Rethinking conditional GAN training: An approach using geometrically structured latent manifolds »
Sameera Ramasinghe · Moshiur Farazi · Salman H Khan · Nick Barnes · Stephen Gould -
2021 Poster: End-to-end Multi-modal Video Temporal Grounding »
Yi-Wen Chen · Yi-Hsuan Tsai · Ming-Hsuan Yang -
2020 Poster: Online Adaptation for Consistent Mesh Reconstruction in the Wild »
Xueting Li · Sifei Liu · Shalini De Mello · Kihwan Kim · Xiaolong Wang · Ming-Hsuan Yang · Jan Kautz -
2019 Poster: Random Path Selection for Continual Learning »
Jathushan Rajasegaran · Munawar Hayat · Salman H Khan · Fahad Shahbaz Khan · Ling Shao -
2019 Poster: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Spotlight: Quadratic Video Interpolation »
Xiangyu Xu · Li Siyao · Wenxiu Sun · Qian Yin · Ming-Hsuan Yang -
2019 Poster: Cross-Domain Transferability of Adversarial Perturbations »
Muhammad Muzammal Naseer · Salman H Khan · Muhammad Haris Khan · Fahad Shahbaz Khan · Fatih Porikli -
2019 Poster: Joint-task Self-supervised Learning for Temporal Correspondence »
Xueting Li · Sifei Liu · Shalini De Mello · Xiaolong Wang · Jan Kautz · Ming-Hsuan Yang -
2019 Poster: Dancing to Music »
Hsin-Ying Lee · Xiaodong Yang · Ming-Yu Liu · Ting-Chun Wang · Yu-Ding Lu · Ming-Hsuan Yang · Jan Kautz -
2018 Poster: Deep Non-Blind Deconvolution via Generalized Low-Rank Approximation »
Wenqi Ren · Jiawei Zhang · Lin Ma · Jinshan Pan · Xiaochun Cao · Wangmeng Zuo · Wei Liu · Ming-Hsuan Yang -
2018 Poster: Context-aware Synthesis and Placement of Object Instances »
Donghoon Lee · Sifei Liu · Jinwei Gu · Ming-Yu Liu · Ming-Hsuan Yang · Jan Kautz -
2018 Poster: Deep Attentive Tracking via Reciprocative Learning »
Shi Pu · YIBING SONG · Chao Ma · Honggang Zhang · Ming-Hsuan Yang -
2017 Poster: Learning Affinity via Spatial Propagation Networks »
Sifei Liu · Shalini De Mello · Jinwei Gu · Guangyu Zhong · Ming-Hsuan Yang · Jan Kautz -
2017 Poster: Semi-Supervised Learning for Optical Flow with Generative Adversarial Networks »
Wei-Sheng Lai · Jia-Bin Huang · Ming-Hsuan Yang -
2017 Poster: Universal Style Transfer via Feature Transforms »
Yijun Li · Chen Fang · Jimei Yang · Zhaowen Wang · Xin Lu · Ming-Hsuan Yang -
2015 Poster: Weakly-supervised Disentangling with Recurrent Transformations for 3D View Synthesis »
Jimei Yang · Scott E Reed · Ming-Hsuan Yang · Honglak Lee