Timezone: »
Knowledge distillation (KD) has proven to be a highly effective approach for enhancing model performance through a teacher-student training scheme. However, most existing distillation methods are designed under the assumption that the teacher and student models belong to the same model family, particularly the hint-based approaches. By using centered kernel alignment (CKA) to compare the learned features between heterogeneous teacher and student models, we observe significant feature divergence. This divergence illustrates the ineffectiveness of previous hint-based methods in cross-architecture distillation. To tackle the challenge in distilling heterogeneous models, we propose a simple yet effective one-for-all KD framework called OFA-KD, which significantly improves the distillation performance between heterogeneous architectures. Specifically, we project intermediate features into an aligned latent space such as the logits space, where architecture-specific information is discarded. Additionally, we introduce an adaptive target enhancement scheme to prevent the student from being disturbed by irrelevant information. Extensive experiments with various architectures, including CNN, Transformer, and MLP, demonstrate the superiority of our OFA-KD framework in enabling distillation between heterogeneous architectures. Specifically, when equipped with our OFA-KD, the student models achieve notable performance improvements, with a maximum gain of 8.0% on the CIFAR-100 dataset and 0.7% on the ImageNet-1K dataset. PyTorch code and checkpoints can be found at https://github.com/Hao840/OFAKD.
Author Information
Zhiwei Hao (Beijing Institute of Technology)
Jianyuan Guo (University of Sydney)
Kai Han (Huawei Noah's Ark Lab)
Yehui Tang (Peking University)
Han Hu (Beijing Institute of Technology)
Yunhe Wang (Huawei Noah's Ark Lab)
Chang Xu (University of Sydney)
More from the Same Authors
-
2020 Meetup: MeetUp: Sydney Australia »
Chang Xu -
2021 Meetup: Sydney, Australia »
Chang Xu -
2022 Poster: Learning Efficient Vision Transformers via Fine-Grained Manifold Distillation »
Zhiwei Hao · Jianyuan Guo · Ding Jia · Kai Han · Yehui Tang · Chao Zhang · Han Hu · Yunhe Wang -
2022 Poster: Vision GNN: An Image is Worth Graph of Nodes »
Kai Han · Yunhe Wang · Jianyuan Guo · Yehui Tang · Enhua Wu -
2023 Poster: Stable Diffusion is Unstable »
Chengbin Du · Yanxi Li · Zhongwei Qiu · Chang Xu -
2023 Poster: Detecting Any Human-Object Interaction Relationship: Universal HOI Detector with Spatial Prompt Learning on Foundation Models »
Yichao Cao · Qingfei Tang · Xiu Su · Song Chen · Shan You · Xiaobo Lu · Chang Xu -
2023 Poster: Beyond Pretrained Features: Noisy Image Modeling Provides Adversarial Defense »
Zunzhi You · Daochang Liu · Bohyung Han · Chang Xu -
2023 Poster: Gold-YOLO: Efficient Object Detector via Gather-and-Distribute Mechanism »
Chengcheng Wang · Wei He · Ying Nie · Jianyuan Guo · Chuanjian Liu · Yunhe Wang · Kai Han -
2023 Poster: Contrastive Sampling Chains in Diffusion Models »
Junyu Zhang · Daochang Liu · Shichao Zhang · Chang Xu -
2023 Poster: Towards Higher Ranks via Adversarial Weight Pruning »
Yuchuan Tian · Hanting Chen · Tianyu Guo · Chao Xu · Yunhe Wang -
2023 Poster: Adversarial Robustness through Random Weight Sampling »
Yanxiang Ma · Minjing Dong · Chang Xu -
2023 Poster: VanillaNet: the Power of Minimalism in Deep Learning »
Hanting Chen · Yunhe Wang · Jianyuan Guo · Dacheng Tao -
2023 Poster: When Visual Prompt Tuning Meets Source-Free Domain Adaptive Semantic Segmentation »
Xinhong Ma · Yiming Wang · Hao Liu · Tianyu Guo · Yunhe Wang -
2023 Poster: Federated Learning with Manifold Regularization and Normalized Update Reaggregation »
Xuming An · Li Shen · Han Hu · Yong Luo -
2023 Poster: PUe: Biased Positive-Unlabeled Learning Enhancement by Causal Inference »
Xutao Wang · Hanting Chen · Tianyu Guo · Yunhe Wang -
2023 Poster: Revisit the Power of Vanilla Knowledge Distillation: from Small Scale to Large Scale »
Zhiwei Hao · Jianyuan Guo · Kai Han · Han Hu · Chang Xu · Yunhe Wang -
2023 Poster: Rethinking Conditional Diffusion Sampling with Progressive Guidance »
Anh-Dung Dinh · Daochang Liu · Chang Xu -
2023 Poster: Knowledge Diffusion for Distillation »
Tao Huang · Yuan Zhang · Mingkai Zheng · Shan You · Fei Wang · Chen Qian · Chang Xu -
2023 Poster: Species196: A One-Million Semi-supervised Dataset for Fine-grained Species Recognition »
Wei He · Kai Han · Ying Nie · Chengcheng Wang · Yunhe Wang -
2023 Poster: GenImage: A Million-Scale Benchmark for Detecting AI-Generated Image »
Mingjian Zhu · Hanting Chen · Qiangyu YAN · Xudong Huang · Guanyu Lin · Wei Li · Zhijun Tu · Hailin Hu · Jie Hu · Yunhe Wang -
2022 Spotlight: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Spotlight: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Bridge the Gap Between Architecture Spaces via A Cross-Domain Predictor »
Yuqiao Liu · Yehui Tang · Zeqiong Lv · Yunhe Wang · Yanan Sun -
2022 Poster: Knowledge Distillation from A Stronger Teacher »
Tao Huang · Shan You · Fei Wang · Chen Qian · Chang Xu -
2022 Poster: Redistribution of Weights and Activations for AdderNet Quantization »
Ying Nie · Kai Han · Haikang Diao · Chuanjian Liu · Enhua Wu · Yunhe Wang -
2022 Poster: GhostNetV2: Enhance Cheap Operation with Long-Range Attention »
Yehui Tang · Kai Han · Jianyuan Guo · Chang Xu · Chao Xu · Yunhe Wang -
2022 Poster: Accelerating Sparse Convolution with Column Vector-Wise Sparsity »
Yijun Tan · Kai Han · Kang Zhao · Xianzhi Yu · Zidong Du · Yunji Chen · Yunhe Wang · Jun Yao -
2022 Poster: A Transformer-Based Object Detector with Coarse-Fine Crossing Representations »
Zhishan Li · Ying Nie · Kai Han · Jianyuan Guo · Lei Xie · Yunhe Wang -
2022 Poster: BiMLP: Compact Binary Architectures for Vision Multi-Layer Perceptrons »
Yixing Xu · Xinghao Chen · Yunhe Wang -
2022 Poster: Searching for Better Spatio-temporal Alignment in Few-Shot Action Recognition »
Yichao Cao · Xiu Su · Qingfei Tang · Shan You · Xiaobo Lu · Chang Xu -
2022 Poster: Random Normalization Aggregation for Adversarial Defense »
Minjing Dong · Xinghao Chen · Yunhe Wang · Chang Xu -
2021 Poster: Adder Attention for Vision Transformer »
Han Shu · Jiahao Wang · Hanting Chen · Lin Li · Yujiu Yang · Yunhe Wang -
2021 Poster: Dynamic Resolution Network »
Mingjian Zhu · Kai Han · Enhua Wu · Qiulin Zhang · Ying Nie · Zhenzhong Lan · Yunhe Wang -
2021 Poster: Post-Training Quantization for Vision Transformer »
Zhenhua Liu · Yunhe Wang · Kai Han · Wei Zhang · Siwei Ma · Wen Gao -
2021 Poster: Handling Long-tailed Feature Distribution in AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Towards Stable and Robust AdderNets »
Minjing Dong · Yunhe Wang · Xinghao Chen · Chang Xu -
2021 Poster: Transformer in Transformer »
Kai Han · An Xiao · Enhua Wu · Jianyuan Guo · Chunjing XU · Yunhe Wang -
2021 Poster: An Empirical Study of Adder Neural Networks for Object Detection »
Xinghao Chen · Chang Xu · Minjing Dong · Chunjing XU · Yunhe Wang -
2021 Poster: Neural Architecture Dilation for Adversarial Robustness »
Yanxi Li · Zhaohui Yang · Yunhe Wang · Chang Xu -
2021 Poster: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2021 Poster: Augmented Shortcuts for Vision Transformers »
Yehui Tang · Kai Han · Chang Xu · An Xiao · Yiping Deng · Chao Xu · Yunhe Wang -
2021 Oral: Learning Frequency Domain Approximation for Binary Neural Networks »
Yixing Xu · Kai Han · Chang Xu · Yehui Tang · Chunjing XU · Yunhe Wang -
2020 Poster: SCOP: Scientific Control for Reliable Neural Network Pruning »
Yehui Tang · Yunhe Wang · Yixing Xu · Dacheng Tao · Chunjing XU · Chao Xu · Chang Xu -
2020 Poster: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Model Rubik’s Cube: Twisting Resolution, Depth and Width for TinyNets »
Kai Han · Yunhe Wang · Qiulin Zhang · Wei Zhang · Chunjing XU · Tong Zhang -
2020 Poster: Adapting Neural Architectures Between Domains »
Yanxi Li · Zhaohui Yang · Yunhe Wang · Chang Xu -
2020 Spotlight: Kernel Based Progressive Distillation for Adder Neural Networks »
Yixing Xu · Chang Xu · Xinghao Chen · Wei Zhang · Chunjing XU · Yunhe Wang -
2020 Poster: Residual Distillation: Towards Portable Deep Neural Networks without Shortcuts »
Guilin Li · Junlei Zhang · Yunhe Wang · Chuanjian Liu · Matthias Tan · Yunfeng Lin · Wei Zhang · Jiashi Feng · Tong Zhang -
2020 Poster: UnModNet: Learning to Unwrap a Modulo Image for High Dynamic Range Imaging »
Chu Zhou · Hang Zhao · Jin Han · Chang Xu · Chao Xu · Tiejun Huang · Boxin Shi -
2020 Poster: Searching for Low-Bit Weights in Quantized Neural Networks »
Zhaohui Yang · Yunhe Wang · Kai Han · Chunjing XU · Chao Xu · Dacheng Tao · Chang Xu -
2019 Poster: Positive-Unlabeled Compression on the Cloud »
Yixing Xu · Yunhe Wang · Hanting Chen · Kai Han · Chunjing XU · Dacheng Tao · Chang Xu -
2019 Poster: Learning from Bad Data via Generation »
Tianyu Guo · Chang Xu · Boxin Shi · Chao Xu · Dacheng Tao -
2018 Poster: Learning Versatile Filters for Efficient Convolutional Neural Networks »
Yunhe Wang · Chang Xu · Chunjing XU · Chao Xu · Dacheng Tao -
2016 Poster: CNNpack: Packing Convolutional Neural Networks in the Frequency Domain »
Yunhe Wang · Chang Xu · Shan You · Dacheng Tao · Chao Xu