Timezone: »
Vision transformers (ViTs) have pushed the state-of-the-art for various visual recognition tasks by patch-wise image tokenization followed by self-attention. However, the employment of self-attention modules results in a quadratic complexity in both computation and memory usage. Various attempts on approximating the self-attention computation with linear complexity have been made in Natural Language Processing. However, an in-depth analysis in this work shows that they are either theoretically flawed or empirically ineffective for visual recognition. We further identify that their limitations are rooted in keeping the softmax self-attention during approximations. Specifically, conventional self-attention is computed by normalizing the scaled dot-product between token feature vectors. Keeping this softmax operation challenges any subsequent linearization efforts. Based on this insight, for the first time, a softmax-free transformer or SOFT is proposed. To remove softmax in self-attention, Gaussian kernel function is used to replace the dot-product similarity without further normalization. This enables a full self-attention matrix to be approximated via a low-rank matrix decomposition. The robustness of the approximation is achieved by calculating its Moore-Penrose inverse using a Newton-Raphson method. Extensive experiments on ImageNet show that our SOFT significantly improves the computational efficiency of existing ViT variants. Crucially, with a linear complexity, much longer token sequences are permitted in SOFT, resulting in superior trade-off between accuracy and complexity.
Author Information
Jiachen Lu (Fudan University)
Jinghan Yao
Junge Zhang (Fudan University)
Xiatian Zhu (Samsung AI Centre, Cambridge)
Hang Xu (Huawei Noah's Ark Lab)
Weiguo Gao (Fudan University)
Chunjing XU (Huawei Technologies)
Tao Xiang (Samsung AI Centre, Cambridge)
Li Zhang (University of Oxford, Queen Mary University of London)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: SOFT: Softmax-free Transformer with Linear Complexity »
Dates n/a. Room
More from the Same Authors
-
2021 : One Million Scenes for Autonomous Driving: ONCE Dataset »
Jiageng Mao · Niu Minzhe · ChenHan Jiang · hanxue liang · Jingheng Chen · Xiaodan Liang · Yamin Li · Chaoqiang Ye · Wei Zhang · Zhenguo Li · Jie Yu · Hang Xu · Chunjing XU -
2021 : SODA10M: A Large-Scale 2D Self/Semi-Supervised Object Detection Dataset for Autonomous Driving »
Jianhua Han · Xiwen Liang · Hang Xu · Kai Chen · Lanqing Hong · Jiageng Mao · Chaoqiang Ye · Wei Zhang · Zhenguo Li · Xiaodan Liang · Chunjing XU -
2022 Poster: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2022 Spotlight: Lightning Talks 6A-4 »
Xiu-Shen Wei · Konstantina Dritsa · Guillaume Huguet · ABHRA CHAUDHURI · Zhenbin Wang · Kevin Qinghong Lin · Yutong Chen · Jianan Zhou · Yongsen Mao · Junwei Liang · Jinpeng Wang · Mao Ye · Yiming Zhang · Aikaterini Thoma · H.-Y. Xu · Daniel Sumner Magruder · Enwei Zhang · Jianing Zhu · Ronglai Zuo · Massimiliano Mancini · Hanxiao Jiang · Jun Zhang · Fangyun Wei · Faen Zhang · Ioannis Pavlopoulos · Zeynep Akata · Xiatian Zhu · Jingfeng ZHANG · Alexander Tong · Mattia Soldan · Chunhua Shen · Yuxin Peng · Liuhan Peng · Michael Wray · Tongliang Liu · Anjan Dutta · Yu Wu · Oluwadamilola Fasina · Panos Louridas · Angel Chang · Manik Kuchroo · Manolis Savva · Shujie LIU · Wei Zhou · Rui Yan · Gang Niu · Liang Tian · Bo Han · Eric Z. XU · Guy Wolf · Yingying Zhu · Brian Mak · Difei Gao · Masashi Sugiyama · Smita Krishnaswamy · Rong-Cheng Tu · Wenzhe Zhao · Weijie Kong · Chengfei Cai · WANG HongFa · Dima Damen · Bernard Ghanem · Wei Liu · Mike Zheng Shou -
2022 Spotlight: MetaTeacher: Coordinating Multi-Model Domain Adaptation for Medical Image Classification »
Zhenbin Wang · Mao Ye · Xiatian Zhu · Liuhan Peng · Liang Tian · Yingying Zhu -
2022 Spotlight: Lightning Talks 5B-4 »
Yuezhi Yang · Zeyu Yang · Yong Lin · Yishi Xu · Linan Yue · Tao Yang · Weixin Chen · Qi Liu · Jiaqi Chen · Dongsheng Wang · Baoyuan Wu · Yuwang Wang · Hao Pan · Shengyu Zhu · Zhenwei Miao · Yan Lu · Lu Tan · Bo Chen · Yichao Du · Haoqian Wang · Wei Li · Yanqing An · Ruiying Lu · Peng Cui · Nanning Zheng · Li Wang · Zhibin Duan · Xiatian Zhu · Mingyuan Zhou · Enhong Chen · Li Zhang -
2022 Spotlight: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Spotlight: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2022 Poster: Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark »
Jiaxi Gu · Xiaojun Meng · Guansong Lu · Lu Hou · Niu Minzhe · Xiaodan Liang · Lewei Yao · Runhui Huang · Wei Zhang · Xin Jiang · Chunjing XU · Hang Xu -
2022 Poster: DetCLIP: Dictionary-Enriched Visual-Concept Paralleled Pre-training for Open-world Detection »
Lewei Yao · Jianhua Han · Youpeng Wen · Xiaodan Liang · Dan Xu · Wei Zhang · Zhenguo Li · Chunjing XU · Hang Xu -
2022 Poster: Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving »
Xiwen Liang · Yangxin Wu · Jianhua Han · Hang Xu · Chunjing XU · Xiaodan Liang -
2022 Poster: DeepInteraction: 3D Object Detection via Modality Interaction »
Zeyu Yang · Jiaqi Chen · Zhenwei Miao · Wei Li · Xiatian Zhu · Li Zhang -
2022 Poster: ST-Adapter: Parameter-Efficient Image-to-Video Transfer Learning »
Junting Pan · Ziyi Lin · Xiatian Zhu · Jing Shao · Hongsheng Li -
2021 Poster: Low-Fidelity Video Encoder Optimization for Temporal Action Localization »
Mengmeng Xu · Juan Manuel Perez Rua · Xiatian Zhu · Bernard Ghanem · Brais Martinez -
2021 Poster: One Loss for All: Deep Hashing with a Single Cosine Similarity based Learning Objective »
Jiun Tian Hoe · Kam Woh Ng · Tianyu Zhang · Chee Seng Chan · Yi-Zhe Song · Tao Xiang -
2021 Poster: Progressive Coordinate Transforms for Monocular 3D Object Detection »
Li Wang · Li Zhang · Yi Zhu · Zhi Zhang · Tong He · Mu Li · Xiangyang Xue -
2021 Poster: Learning Transferable Features for Point Cloud Detection via 3D Contrastive Co-training »
Zeng Yihan · Chunwei Wang · Yunbo Wang · Hang Xu · Chaoqiang Ye · Zhen Yang · Chao Ma -
2021 Poster: Transformer in Transformer »
Kai Han · An Xiao · Enhua Wu · Jianyuan Guo · Chunjing XU · Yunhe Wang -
2021 Poster: An Empirical Study of Adder Neural Networks for Object Detection »
Xinghao Chen · Chang Xu · Minjing Dong · Chunjing XU · Yunhe Wang -
2021 Poster: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2021 Poster: S$^3$: Sign-Sparse-Shift Reparametrization for Effective Training of Low-bit Shift Networks »
Xinlin Li · Bang Liu · Yaoliang Yu · Wulong Liu · Chunjing XU · Vahid Partovi Nia -
2021 Oral: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2020 Poster: SCOP: Scientific Control for Reliable Neural Network Pruning »
Yehui Tang · Yunhe Wang · Yixing Xu · Dacheng Tao · Chunjing XU · Chao Xu · Chang Xu -
2020 Poster: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets »
Kai Han · Yunhe Wang · Qiulin Zhang · Wei Zhang · Chunjing XU · Tong Zhang -
2020 Spotlight: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Searching for Low-Bit Weights in Quantized Neural Networks »
Zhaohui Yang · Yunhe Wang · Kai Han · Chunjing XU · Chao Xu · Dacheng Tao · Chang Xu -
2020 Poster: Bridging the Gap between Sample-based and One-shot Neural Architecture Search with BONAS »
Han Shi · Renjie Pi · Hang Xu · Zhenguo Li · James Kwok · Tong Zhang -
2020 Poster: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation »
Yangxin Wu · Gengwei Zhang · Hang Xu · Xiaodan Liang · Liang Lin -
2019 Poster: Positive-Unlabeled Compression on the Cloud »
Yixing Xu · Yunhe Wang · Hanting Chen · Kai Han · Chunjing XU · Dacheng Tao · Chang Xu -
2018 Poster: Domain-Invariant Projection Learning for Zero-Shot Recognition »
An Zhao · Mingyu Ding · Jiechao Guan · Zhiwu Lu · Tao Xiang · Ji-Rong Wen -
2018 Poster: Learning Versatile Filters for Efficient Convolutional Neural Networks »
Yunhe Wang · Chang Xu · Chunjing XU · Chao Xu · Dacheng Tao -
2017 : Break + Poster (1) »
Devendra Singh Chaplot · CHIH-YAO MA · Simon Brodeur · Eri Matsuo · Ichiro Kobayashi · Seitaro Shinagawa · Koichiro Yoshino · Yuhong Guo · Ben Murdoch · Kanthashree Mysore Sathyendra · Daniel Ricks · Haichao Zhang · Joshua Peterson · Li Zhang · Mircea Mironenco · Peter Anderson · Mark Johnson · Kang Min Yoo · Guntis Barzdins · Ahmed H Zaidi · Martin Andrews · Sam Witteveen · SUBBAREDDY OOTA · Prashanth Vijayaraghavan · Ke Wang · Yan Zhu · Renars Liepins · Max Quinn · Amit Raj · Vincent Cartillier · Eric Chu · Ethan Caballero · Fritz Obermeyer