Timezone: »
We present a general vision transformer backbone, called as Orthogonal Transformer, in pursuit of both efficiency and effectiveness. A major challenge for vision transformer is that self-attention, as the key element in capturing long-range dependency, is very computationally expensive for dense prediction tasks (e.g., object detection). Coarse global self-attention and local self-attention are then designed to reduce the cost, but they suffer from either neglecting local correlations or hurting global modeling. We present an orthogonal self-attention mechanism to alleviate these issues. Specifically, self-attention is computed in the orthogonal space that is reversible to the spatial domain but has much lower resolution. The capabilities of learning global dependency and exploring local correlations are maintained because every orthogonal token in self-attention can attend to the entire visual tokens. Remarkably, orthogonality is realized by constructing an endogenously orthogonal matrix that is friendly to neural networks and can be optimized as arbitrary orthogonal matrices. We also introduce Positional MLP to incorporate position information for arbitrary input resolutions as well as enhance the capacity of MLP. Finally, we develop a hierarchical architecture for Orthogonal Transformer. Extensive experiments demonstrate its strong performance on a broad range of vision tasks, including image classification, object detection, instance segmentation and semantic segmentation.
Author Information
Huaibo Huang (Institute of Automation, Chinese Academy of Science)
Xiaoqiang Zhou (University of Science and Technology of China)
Ran He (NLPR, CASIA)
More from the Same Authors
-
2022 Poster: Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks »
Jiyang Guan · Jian Liang · Ran He -
2022 Spotlight: Are You Stealing My Model? Sample Correlation for Fingerprinting Deep Neural Networks »
Jiyang Guan · Jian Liang · Ran He -
2022 Spotlight: Lightning Talks 3A-1 »
Shu Ding · Wanxing Chang · Jiyang Guan · Mouxiang Chen · Guan Gui · Yue Tan · Shiyun Lin · Guodong Long · Yuze Han · Wei Wang · Zhen Zhao · Ye Shi · Jian Liang · Chenghao Liu · Lei Qi · Ran He · Jie Ma · Zemin Liu · Xiang Li · Hoang Tuan · Luping Zhou · Zhihua Zhang · Jianling Sun · Jingya Wang · LU LIU · Tianyi Zhou · Lei Wang · Jing Jiang · Yinghuan Shi -
2020 Poster: AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection »
Hao Zhu · Chaoyou Fu · Qianyi Wu · Wayne Wu · Chen Qian · Ran He -
2019 Poster: Dual Variational Generation for Low Shot Heterogeneous Face Recognition »
Chaoyou Fu · Xiang Wu · Yibo Hu · Huaibo Huang · Ran He -
2019 Spotlight: Dual Variational Generation for Low Shot Heterogeneous Face Recognition »
Chaoyou Fu · Xiang Wu · Yibo Hu · Huaibo Huang · Ran He -
2018 Poster: Learning a High Fidelity Pose Invariant Model for High-resolution Face Frontalization »
Jie Cao · Yibo Hu · Hongwen Zhang · Ran He · Zhenan Sun -
2018 Poster: IntroVAE: Introspective Variational Autoencoders for Photographic Image Synthesis »
Huaibo Huang · zhihang li · Ran He · Zhenan Sun · Tieniu Tan -
2017 Poster: Deep Supervised Discrete Hashing »
Qi Li · Zhenan Sun · Ran He · Tieniu Tan