Timezone: »
We present a High-Resolution Transformer (HRFormer) that learns high-resolution representations for dense prediction tasks, in contrast to the original Vision Transformer that produces low-resolution representations and has high memory and computational cost. We take advantage of the multi-resolution parallel design introduced in high-resolution convolutional networks (HRNet [45]), along with local-window self-attention that performs self-attention over small non-overlapping image windows [21], for improving the memory and computation efficiency. In addition, we introduce a convolution into the FFN to exchange information across the disconnected image windows. We demonstrate the effectiveness of the HighResolution Transformer on both human pose estimation and semantic segmentation tasks, e.g., HRFormer outperforms Swin transformer [27] by 1.3 AP on COCO pose estimation with 50% fewer parameters and 30% fewer FLOPs. Code is available at: https://github.com/HRNet/HRFormer
Author Information
YUHUI YUAN (Microsoft Research)
Rao Fu (Brown University)
Lang Huang (Peking University)
Weihong Lin (Microsoft)
Chao Zhang (Peking University)
Xilin Chen (Institute of Computing Technology, Chinese Academy of Sciences)
Jingdong Wang (Microsoft Research,)
More from the Same Authors
-
2021 Spotlight: SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search »
Qi Chen · Bing Zhao · Haidong Wang · Mingqin Li · Chuanjie Liu · Zengzhong Li · Mao Yang · Jingdong Wang -
2021 Poster: SPANN: Highly-efficient Billion-scale Approximate Nearest Neighborhood Search »
Qi Chen · Bing Zhao · Haidong Wang · Mingqin Li · Chuanjie Liu · Zengzhong Li · Mao Yang · Jingdong Wang -
2020 Poster: Self-Adaptive Training: beyond Empirical Risk Minimization »
Lang Huang · Chao Zhang · Hongyang Zhang -
2019 Poster: Cross Attention Network for Few-shot Classification »
Ruibing Hou · Hong Chang · Bingpeng MA · Shiguang Shan · Xilin Chen -
2019 Poster: Multi-label Co-regularization for Semi-supervised Facial Action Unit Recognition »
Xuesong Niu · Hu Han · Shiguang Shan · Xilin Chen -
2018 Poster: Weakly Supervised Dense Event Captioning in Videos »
Xin Wang · Wenbing Huang · Chuang Gan · Jingdong Wang · Wenwu Zhu · Junzhou Huang -
2014 Poster: Generalized Unsupervised Manifold Alignment »
Zhen Cui · Hong Chang · Shiguang Shan · Xilin Chen