Timezone: »
Poster
Green Hierarchical Vision Transformer for Masked Image Modeling
Lang Huang · Shan You · Mingkai Zheng · Fei Wang · Chen Qian · Toshihiko Yamasaki
We present an efficient approach for Masked Image Modeling (MIM) with hierarchical Vision Transformers (ViTs), allowing the hierarchical ViTs to discard masked patches and operate only on the visible ones. Our approach consists of three key designs. First, for window attention, we propose a Group Window Attention scheme following the Divide-and-Conquer strategy. To mitigate the quadratic complexity of the self-attention w.r.t. the number of patches, group attention encourages a uniform partition that visible patches within each local window of arbitrary size can be grouped with equal size, where masked self-attention is then performed within each group. Second, we further improve the grouping strategy via the Dynamic Programming algorithm to minimize the overall computation cost of the attention on the grouped patches. Third, as for the convolution layers, we convert them to the Sparse Convolution that works seamlessly with the sparse data, i.e., the visible patches in MIM. As a result, MIM can now work on most, if not all, hierarchical ViTs in a green and efficient way. For example, we can train the hierarchical ViTs, e.g., Swin Transformer and Twins Transformer, about 2.7$\times$ faster and reduce the GPU memory usage by 70%, while still enjoying competitive performance on ImageNet classification and the superiority on downstream COCO object detection benchmarks.
Author Information
Lang Huang (The University of Tokyo)
I am currently a second-year Ph.D. student at the Department of Information & Communication Engineering, The University of Tokyo. Prior to that, I received a Master’s degree from the Department of Machine Intelligence, School of Electronics Engineering and Computer Science, Peking University in 2021. My research interests include self-supervised representation learning, robust learning from noisy data, and vision transformers.
Shan You (SenseTime Research)
Mingkai Zheng (University of Sydney)
Fei Wang (Sensetime)
Chen Qian (SenseTime)
Toshihiko Yamasaki (The University of Tokyo)
More from the Same Authors
-
2022 Poster: Weak-shot Semantic Segmentation via Dual Similarity Transfer »
Junjie Chen · Li Niu · Siyuan Zhou · Jianlou Si · Chen Qian · Liqing Zhang -
2023 Poster: Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models »
Yichao Cao · Qingfei Tang · Xiu Su · Song Chen · Shan You · Chang Xu · Xiaobo Lu -
2023 Poster: Adversarial Training from Mean Field Perspective »
Soichiro Kumano · Hiroshi Kera · Toshihiko Yamasaki -
2023 Poster: Knowledge Diffusion for Distillation »
Tao Huang · Yuan Zhang · Mingkai Zheng · Shan You · Fei Wang · Chen Qian · Chang Xu -
2022 Spotlight: Lightning Talks 6B-4 »
Junjie Chen · Chuanxia Zheng · JINLONG LI · Yu Shi · Shichao Kan · Yu Wang · Fermín Travi · Ninh Pham · Lei Chai · Guobing Gan · Tung-Long Vuong · Gonzalo Ruarte · Tao Liu · Li Niu · Jingjing Zou · Zequn Jie · Peng Zhang · Ming LI · Yixiong Liang · Guolin Ke · Jianfei Cai · Gaston Bujia · Sunzhu Li · Siyuan Zhou · Jingyang Lin · Xu Wang · Min Li · Zhuoming Chen · Qing Ling · Xiaolin Wei · Xiuqing Lu · Shuxin Zheng · Dinh Phung · Yigang Cen · Jianlou Si · Juan Esteban Kamienkowski · Jianxin Wang · Chen Qian · Lin Ma · Benyou Wang · Yingwei Pan · Tie-Yan Liu · Liqing Zhang · Zhihai He · Ting Yao · Tao Mei -
2022 Spotlight: Weak-shot Semantic Segmentation via Dual Similarity Transfer »
Junjie Chen · Li Niu · Siyuan Zhou · Jianlou Si · Chen Qian · Liqing Zhang -
2022 Poster: Knowledge Distillation from A Stronger Teacher »
Tao Huang · Shan You · Fei Wang · Chen Qian · Chang Xu -
2022 Poster: Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition »
Yichao Cao · Xiu Su · Qingfei Tang · Shan You · Xiaobo Lu · Chang Xu -
2021 Poster: HRFormer: High-Resolution Vision Transformer for Dense Predict »
YUHUI YUAN · Rao Fu · Lang Huang · Weihong Lin · Chao Zhang · Xilin Chen · Jingdong Wang -
2021 Poster: ReSSL: Relational Self-Supervised Learning with Weak Augmentation »
Mingkai Zheng · Shan You · Fei Wang · Chen Qian · Changshui Zhang · Xiaogang Wang · Chang Xu -
2020 Poster: Self-Adaptive Training: beyond Empirical Risk Minimization »
Lang Huang · Chao Zhang · Hongyang Zhang -
2020 Poster: Agree to Disagree: Adaptive Ensemble Knowledge Distillation in Gradient Space »
Shangchen Du · Shan You · Xiaojie Li · Jianlong Wu · Fei Wang · Chen Qian · Changshui Zhang -
2020 Poster: AOT: Appearance Optimal Transport Based Identity Swapping for Forgery Detection »
Hao Zhu · Chaoyou Fu · Qianyi Wu · Wayne Wu · Chen Qian · Ran He -
2020 Poster: ISTA-NAS: Efficient and Consistent Neural Architecture Search by Sparse Coding »
Yibo Yang · Hongyang Li · Shan You · Fei Wang · Chen Qian · Zhouchen Lin