Timezone: »
In the past few years, transformers have achieved promising performance on various computer vision tasks. Unfortunately, the immense inference overhead of most existing vision transformers withholds them from being deployed on edge devices such as cell phones and smart watches. Knowledge distillation is a widely used paradigm for compressing cumbersome architectures into compact students via transferring information. However, most of them are designed for convolutional neural networks (CNNs), which do not fully investigate the character of vision transformers. In this paper, we fully utilize the patch-level information and propose a fine-grained manifold distillation method for transformer-based networks. Specifically, we train a tiny student model to match a pre-trained teacher model in the patch-level manifold space. Then, we decouple the manifold matching loss into three terms with careful design to further reduce the computational costs for the patch relationship. Equipped with the proposed method, a DeiT-Tiny model containing 5M parameters achieves 76.5\% top-1 accuracy on ImageNet-1k, which is +2.0\% higher than previous distillation approaches. Transfer learning results on other classification benchmarks and downstream vision tasks also demonstrate the superiority of our method over the state-of-the-art algorithms.
Author Information
Zhiwei Hao (Beijing Institute of Technology)
Jianyuan Guo (University of Sydney)
Ding Jia (Peking University)
Kai Han (Huawei Noah's Ark Lab)
Yehui Tang (Peking University)
Chao Zhang (Peking University)
Han Hu (Beijing Institute of Technology)
Yunhe Wang (Huawei Noah's Ark Lab)
More from the Same Authors
-
2022 Poster: Vision GNN: An Image is Worth Graph of Nodes »
Kai Han · Yunhe Wang · Jianyuan Guo · Yehui Tang · Enhua Wu -
2022 Spotlight: Lightning Talks 6B-3 »
Lingfeng Yang · Yao Lai · Zizheng Pan · Zhenyu Wang · Weicong Liang · Chuanyang Zheng · Jian-Wei Zhang · Peng Jin · Jing Liu · Xiuying Wei · Yao Mu · Xiang Li · YUHUI YUAN · Zizheng Pan · Yifan Sun · Yunchen Zhang · Jianfei Cai · Hao Luo · zheyang li · Jinfa Huang · Haoyu He · Yi Yang · Ping Luo · Fenglin Liu · Henghui Ding · Borui Zhao · Xiangguo Zhang · Kai Zhang · Pichao WANG · Bohan Zhuang · Wei Chen · Ruihao Gong · Zhi Yang · Xian Wu · Feng Ding · Jianfei Cai · Xiao Luo · Renjie Song · Weihong Lin · Jian Yang · Wenming Tan · Bohan Zhuang · Shanghang Zhang · Shen Ge · Fan Wang · Qi Zhang · Guoli Song · Jun Xiao · Hao Li · Ding Jia · David Clifton · Ye Ren · Fengwei Yu · Zheng Zhang · Jie Chen · Shiliang Pu · Xianglong Liu · Chao Zhang · Han Hu -
2022 Spotlight: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Spotlight: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Spotlight: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor »
Yuqiao Liu · Yehui Tang · Zeqiong Lv · Yunhe Wang · Yanan Sun -
2022 Poster: Redistribution of Weights and Activations for AdderNet Quantization »
Ying Nie · Kai Han · Haikang Diao · Chuanjian Liu · Enhua Wu · Yunhe Wang -
2022 Poster: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Poster: Accelerating Sparse Convolution with Column Vector-Wise Sparsity »
Yijun Tan · Kai Han · Kang Zhao · Xianzhi Yu · Zidong Du · Yunji Chen · Yunhe Wang · Jun Yao -
2022 Poster: A Transformer-Based Object Detector with Coarse-Fine Crossing Representations »
Zhishan Li · Ying Nie · Kai Han · Jianyuan Guo · Lei Xie · Yunhe Wang -
2022 Poster: Expediting Large-Scale Vision Transformer for Dense Prediction without Fine-tuning »
Weicong Liang · YUHUI YUAN · Henghui Ding · Xiao Luo · Weihong Lin · Ding Jia · Zheng Zhang · Chao Zhang · Han Hu -
2022 Poster: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Poster: Random Normalization Aggregation for Adversarial Defense »
Minjing Dong · Xinghao Chen · Yunhe Wang · Chang Xu -
2021 Poster: Adder Attention for Vision Transformer »
Han Shu · Jiahao Wang · Hanting Chen · Lin Li · Yujiu Yang · Yunhe Wang -
2021 Poster: Dynamic Resolution Network »
Mingjian Zhu · Kai Han · Enhua Wu · Qiulin Zhang · Ying Nie · Zhenzhong Lan · Yunhe Wang -
2021 Poster: HRFormer: High-Resolution Vision Transformer for Dense Predict »
YUHUI YUAN · Rao Fu · Lang Huang · Weihong Lin · Chao Zhang · Xilin Chen · Jingdong Wang -
2021 Poster: Post-Training Quantization for Vision Transformer »
Zhenhua Liu · Yunhe Wang · Kai Han · Wei Zhang · Siwei Ma · Wen Gao -
2021 Poster: Handling Long-tailed Feature Distribution in AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Towards Stable and Robust AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Transformer in Transformer »
Kai Han · An Xiao · Enhua Wu · Jianyuan Guo · Chunjing XU · Yunhe Wang -
2021 Poster: An Empirical Study of Adder Neural Networks for Object Detection »
Xinghao Chen · Chang Xu · Minjing Dong · Chunjing XU · Yunhe Wang -
2021 Poster: Neural Architecture Dilation for Adversarial Robustness »
Yanxi Li · Zhaohui Yang · Yunhe Wang · Chang Xu -
2021 Poster: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2021 Poster: Augmented Shortcuts for Vision Transformers »
Yehui Tang · Kai Han · Chang Xu · An Xiao · Yiping Deng · Chao Xu · Yunhe Wang -
2021 Oral: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2020 Poster: SCOP: Scientific Control for Reliable Neural Network Pruning »
Yehui Tang · Yunhe Wang · Yixing Xu · Dacheng Tao · Chunjing XU · Chao Xu · Chang Xu -
2020 Poster: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Self-Adaptive Training: beyond Empirical Risk Minimization »
Lang Huang · Chao Zhang · Hongyang Zhang -
2020 Poster: Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets »
Kai Han · Yunhe Wang · Qiulin Zhang · Wei Zhang · Chunjing XU · Tong Zhang -
2020 Spotlight: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts »
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang -
2020 Poster: Searching for Low-Bit Weights in Quantized Neural Networks »
Zhaohui Yang · Yunhe Wang · Kai Han · Chunjing XU · Chao Xu · Dacheng Tao · Chang Xu -
2019 Poster: Positive-Unlabeled Compression on the Cloud »
Yixing Xu · Yunhe Wang · Hanting Chen · Kai Han · Chunjing XU · Dacheng Tao · Chang Xu -
2018 Poster: Joint Sub-bands Learning with Clique Structures for Wavelet Domain Super-Resolution »
Zhisheng Zhong · Tiancheng Shen · Yibo Yang · Zhouchen Lin · Chao Zhang -
2018 Poster: Greedy Hash: Towards Fast Optimization for Accurate Hash Coding in CNN »
Shupeng Su · Chao Zhang · Kai Han · Yonghong Tian -
2018 Poster: Learning Versatile Filters for Efficient Convolutional Neural Networks »
Yunhe Wang · Chang Xu · Chunjing XU · Chao Xu · Dacheng Tao -
2016 Poster: CNNpack: Packing Convolutional Neural Networks in the Frequency Domain »
Yunhe Wang · Chang Xu · Shan You · Dacheng Tao · Chao Xu