Timezone: »
Transformer emerges as a powerful tool for visual recognition. In addition to demonstrating competitive performance on a broad range of visual benchmarks, recent works also argue that Transformers are much more robust than Convolutions Neural Networks (CNNs). Nonetheless, surprisingly, we find these conclusions are drawn from unfair experimental settings, where Transformers and CNNs are compared at different scales and are applied with distinct training frameworks. In this paper, we aim to provide the first fair & in-depth comparisons between Transformers and CNNs, focusing on robustness evaluations. With our unified training setup, we first challenge the previous belief that Transformers outshine CNNs when measuring adversarial robustness. More surprisingly, we find CNNs can easily be as robust as Transformers on defending against adversarial attacks, if they properly adopt Transformers' training recipes. While regarding generalization on out-of-distribution samples, we show pre-training on (external) large-scale datasets is not a fundamental request for enabling Transformers to achieve better performance than CNNs. Moreover, our ablations suggest such stronger generalization is largely benefited by the Transformer's self-attention-like architectures per se, rather than by other training setups. We hope this work can help the community better understand and benchmark the robustness of Transformers and CNNs. The code and models are publicly available at: https://github.com/ytongbai/ViTs-vs-CNNs.
Author Information
Yutong Bai (Johns Hopkins University)
Jieru Mei (Johns Hopkins University)
Alan Yuille (JHU)
Cihang Xie (UC Santa Cruz)
More from the Same Authors
-
2021 : Occluded Video Instance Segmentation: Dataset and ICCV 2021 Challenge »
Jiyang Qi · Yan Gao · Yao Hu · Xinggang Wang · Xiaoyu Liu · Xiang Bai · Serge Belongie · Alan Yuille · Philip Torr · Song Bai -
2021 : Understanding Catastrophic Forgetting and Remembering in Continual Learning with Optimal Relevance Mapping »
prakhar kaushik · Adam Kortylewski · Alex Gain · Alan Yuille -
2021 Poster: Glance-and-Gaze Vision Transformer »
Qihang Yu · Yingda Xia · Yutong Bai · Yongyi Lu · Alan Yuille · Wei Shen -
2021 Poster: Neural View Synthesis and Matching for Semi-Supervised Few-Shot Learning of 3D Pose »
Angtian Wang · Shenxiao Mei · Alan Yuille · Adam Kortylewski -
2017 : Competition I: Adversarial Attacks and Defenses »
Alexey Kurakin · Ian Goodfellow · Samy Bengio · Yao Zhao · Yinpeng Dong · Tianyu Pang · Fangzhou Liao · Cihang Xie · Adithya Ganesh · Oguz Elibol -
2017 Poster: Label Distribution Learning Forests »
Wei Shen · KAI ZHAO · Yilu Guo · Alan Yuille