Timezone: »
The tremendous success of large models trained on extensive datasets demonstrates that scale is a key ingredient in achieving superior results. Therefore, the reflection on the rationality of designing knowledge distillation (KD) approaches for limited-capacity architectures solely based on small-scale datasets is now deemed imperative. In this paper, we identify the small data pitfall that presents in previous KD methods, which results in the underestimation of the power of vanilla KD framework on large-scale datasets such as ImageNet-1K. Specifically, we show that employing stronger data augmentation techniques and using larger datasets can directly decrease the gap between vanilla KD and other meticulously designed KD variants. This highlights the necessity of designing and evaluating KD approaches in the context of practical scenarios, casting off the limitations of small-scale datasets. Our investigation of the vanilla KD and its variants in more complex schemes, including stronger training strategies and different model capacities, demonstrates that vanilla KD is elegantly simple but astonishingly effective in large-scale scenarios. Without bells and whistles, we obtain state-of-the-art ResNet-50, ViT-S, and ConvNeXtV2-T models for ImageNet, which achieve 83.1%, 84.3%, and 85.0% top-1 accuracy, respectively. PyTorch code and checkpoints can be found at https://github.com/Hao840/vanillaKD.
Author Information
Zhiwei Hao (Beijing Institute of Technology)
Jianyuan Guo (University of Sydney)
Kai Han (Huawei Noah's Ark Lab)
Han Hu (Beijing Institute of Technology)
Chang Xu (University of Sydney)
Yunhe Wang (Huawei Noah's Ark Lab)
More from the Same Authors
-
2020 Meetup: MeetUp: Sydney Australia »
Chang Xu -
2021 Meetup: Sydney, Australia »
Chang Xu -
2022 Poster: Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation »
Zhiwei Hao · Jianyuan Guo · Ding Jia · Kai Han · Yehui Tang · Chao Zhang · Han Hu · Yunhe Wang -
2022 Poster: Vision GNN: An Image is Worth Graph of Nodes »
Kai Han · Yunhe Wang · Jianyuan Guo · Yehui Tang · Enhua Wu -
2023 Poster: Stable Diffusion is Unstable »
Chengbin Du · Yanxi Li · Zhongwei Qiu · Chang Xu -
2023 Poster: Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models »
Yichao Cao · Qingfei Tang · Xiu Su · Song Chen · Shan You · Xiaobo Lu · Chang Xu -
2023 Poster: Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial Defense »
Zunzhi You · Daochang Liu · Bohyung Han · Chang Xu -
2023 Poster: Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism »
Chengcheng Wang · Wei He · Ying Nie · Jianyuan Guo · Chuanjian Liu · Yunhe Wang · Kai Han -
2023 Poster: Contrastive Sampling Chains in Diffusion Models »
Junyu Zhang · Daochang Liu · Shichao Zhang · Chang Xu -
2023 Poster: Towards Higher Ranks via Adversarial Weight Pruning »
Yuchuan Tian · Hanting Chen · Tianyu Guo · Chao Xu · Yunhe Wang -
2023 Poster: Adversarial Robustness through Random Weight Sampling »
Yanxiang Ma · Minjing Dong · Chang Xu -
2023 Poster: VanillaNet: the Power of Minimalism in Deep Learning »
Hanting Chen · Yunhe Wang · Jianyuan Guo · Dacheng Tao -
2023 Poster: When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation »
Xinhong Ma · Yiming Wang · Hao Liu · Tianyu Guo · Yunhe Wang -
2023 Poster: One-for-All: Bridge the Gap Between Heterogeneous Architectures in Knowledge Distillation »
Zhiwei Hao · Jianyuan Guo · Kai Han · Yehui Tang · Han Hu · Yunhe Wang · Chang Xu -
2023 Poster: Federated Learning with Manifold Regularization and Normalized Update Reaggregation »
Xuming An · Li Shen · Han Hu · Yong Luo -
2023 Poster: PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference »
Xutao Wang · Hanting Chen · Tianyu Guo · Yunhe Wang -
2023 Poster: Rethinking Conditional Diffusion Sampling with Progressive Guidance »
Anh-Dung Dinh · Daochang Liu · Chang Xu -
2023 Poster: Knowledge Diffusion for Distillation »
Tao Huang · Yuan Zhang · Mingkai Zheng · Shan You · Fei Wang · Chen Qian · Chang Xu -
2023 Poster: Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition »
Wei He · Kai Han · Ying Nie · Chengcheng Wang · Yunhe Wang -
2023 Poster: GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image »
Mingjian Zhu · Hanting Chen · Qiangyu YAN · Xudong Huang · Guanyu Lin · Wei Li · Zhijun Tu · Hailin Hu · Jie Hu · Yunhe Wang -
2022 Spotlight: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Spotlight: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor »
Yuqiao Liu · Yehui Tang · Zeqiong Lv · Yunhe Wang · Yanan Sun -
2022 Poster: Knowledge Distillation from A Stronger Teacher »
Tao Huang · Shan You · Fei Wang · Chen Qian · Chang Xu -
2022 Poster: Redistribution of Weights and Activations for AdderNet Quantization »
Ying Nie · Kai Han · Haikang Diao · Chuanjian Liu · Enhua Wu · Yunhe Wang -
2022 Poster: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Poster: Accelerating Sparse Convolution with Column Vector-Wise Sparsity »
Yijun Tan · Kai Han · Kang Zhao · Xianzhi Yu · Zidong Du · Yunji Chen · Yunhe Wang · Jun Yao -
2022 Poster: A Transformer-Based Object Detector with Coarse-Fine Crossing Representations »
Zhishan Li · Ying Nie · Kai Han · Jianyuan Guo · Lei Xie · Yunhe Wang -
2022 Poster: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Poster: Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition »
Yichao Cao · Xiu Su · Qingfei Tang · Shan You · Xiaobo Lu · Chang Xu -
2022 Poster: Random Normalization Aggregation for Adversarial Defense »
Minjing Dong · Xinghao Chen · Yunhe Wang · Chang Xu -
2021 Poster: Adder Attention for Vision Transformer »
Han Shu · Jiahao Wang · Hanting Chen · Lin Li · Yujiu Yang · Yunhe Wang -
2021 Poster: Dynamic Resolution Network »
Mingjian Zhu · Kai Han · Enhua Wu · Qiulin Zhang · Ying Nie · Zhenzhong Lan · Yunhe Wang -
2021 Poster: Post-Training Quantization for Vision Transformer »
Zhenhua Liu · Yunhe Wang · Kai Han · Wei Zhang · Siwei Ma · Wen Gao -
2021 Poster: Handling Long-tailed Feature Distribution in AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Towards Stable and Robust AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Transformer in Transformer »
Kai Han · An Xiao · Enhua Wu · Jianyuan Guo · Chunjing XU · Yunhe Wang -
2021 Poster: An Empirical Study of Adder Neural Networks for Object Detection »
Xinghao Chen · Chang Xu · Minjing Dong · Chunjing XU · Yunhe Wang -
2021 Poster: Neural Architecture Dilation for Adversarial Robustness »
Yanxi Li · Zhaohui Yang · Yunhe Wang · Chang Xu -
2021 Poster: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2021 Poster: Augmented Shortcuts for Vision Transformers »
Yehui Tang · Kai Han · Chang Xu · An Xiao · Yiping Deng · Chao Xu · Yunhe Wang -
2021 Oral: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2020 Poster: SCOP: Scientific Control for Reliable Neural Network Pruning »
Yehui Tang · Yunhe Wang · Yixing Xu · Dacheng Tao · Chunjing XU · Chao Xu · Chang Xu -
2020 Poster: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets »
Kai Han · Yunhe Wang · Qiulin Zhang · Wei Zhang · Chunjing XU · Tong Zhang -
2020 Poster: Adapting Neural Architectures Between Domains »
Yanxi Li · Zhaohui Yang · Yunhe Wang · Chang Xu -
2020 Spotlight: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts »
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang -
2020 Poster: UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging »
Chu Zhou · Hang Zhao · Jin Han · Chang Xu · Chao Xu · Tiejun Huang · Boxin Shi -
2020 Poster: Searching for Low-Bit Weights in Quantized Neural Networks »
Zhaohui Yang · Yunhe Wang · Kai Han · Chunjing XU · Chao Xu · Dacheng Tao · Chang Xu -
2019 Poster: Positive-Unlabeled Compression on the Cloud »
Yixing Xu · Yunhe Wang · Hanting Chen · Kai Han · Chunjing XU · Dacheng Tao · Chang Xu -
2019 Poster: Learning from Bad Data via Generation »
Tianyu Guo · Chang Xu · Boxin Shi · Chao Xu · Dacheng Tao -
2018 Poster: Learning Versatile Filters for Efficient Convolutional Neural Networks »
Yunhe Wang · Chang Xu · Chunjing XU · Chao Xu · Dacheng Tao -
2016 Poster: CNNpack: Packing Convolutional Neural Networks in the Frequency Domain »
Yunhe Wang · Chang Xu · Shan You · Dacheng Tao · Chao Xu