Timezone: »
Knowledge Distillation (KD) aims at transferring the knowledge of a well-performed neural network (the {\it teacher}) to a weaker one (the {\it student}). A peculiar phenomenon is that a more accurate model doesn't necessarily teach better, and temperature adjustment can neither alleviate the mismatched capacity. To explain this, we decompose the efficacy of KD into three parts: {\it correct guidance}, {\it smooth regularization}, and {\it class discriminability}. The last term describes the distinctness of {\it wrong class probabilities} that the teacher provides in KD. Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of {\it class discriminability}, resulting in less discriminative wrong class probabilities. Therefore, we propose {\it Asymmetric Temperature Scaling (ATS)}, which separately applies a higher/lower temperature to the correct/wrong class. ATS enlarges the variance of wrong class probabilities in the teacher's label and makes the students grasp the absolute affinities of wrong classes to the target class as discriminative as possible. Both theoretical analysis and extensive experimental results demonstrate the effectiveness of ATS. The demo developed in Mindspore is available at \url{https://gitee.com/lxcnju/ats-mindspore} and will be available at \url{https://gitee.com/mindspore/models/tree/master/research/cv/ats}.
Author Information
Xin-Chun Li (Nanjing University)
Wen-shu Fan (Nanjing University)
Shaoming Song (Noah's Ark Lab, Huawei Technologies Ltd.)
Yinchuan Li (Huawei Technologies Ltd.)
bingshuai Li (Huawei Technologies Ltd.)
Shao Yunfeng (Huawei Technologies Co., Ltd.)
De-Chuan Zhan (Nanjing University)
More from the Same Authors
-
2022 Poster: Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief »
Kaiyang Guo · Shao Yunfeng · Yanhui Geng -
2023 Poster: Model Spider: Learning to Rank Pre-Trained Models Efficiently »
Yi-Kai Zhang · Ting-Ji Huang · Yao-Xiang Ding · De-Chuan Zhan · Han-Jia Ye -
2023 Poster: Few-Shot Class-Incremental Learning via Training-Free Prototype Calibration »
Qi-wei Wang · Da-Wei Zhou · Yi-Kai Zhang · De-Chuan Zhan · Han-Jia Ye -
2023 Poster: Beyond probability partitions: Calibrating neural networks with semantic aware grouping »
Jia-Qi Yang · De-Chuan Zhan · Le Gan -
2022 Spotlight: Asymmetric Temperature Scaling Makes Larger Networks Teach Well Again »
Xin-Chun Li · Wen-shu Fan · Shaoming Song · Yinchuan Li · bingshuai Li · Shao Yunfeng · De-Chuan Zhan -
2022 Spotlight: Lightning Talks 5A-2 »
Qiang LI · Zhiwei Xu · Jia-Qi Yang · Thai Hung Le · Haoxuan Qu · Yang Li · Artyom Sorokin · Peirong Zhang · Mira Finkelstein · Nitsan levy · Chung-Yiu Yau · dapeng li · Thommen Karimpanal George · De-Chuan Zhan · Nazar Buzun · Jiajia Jiang · Li Xu · Yichuan Mo · Yujun Cai · Yuliang Liu · Leonid Pugachev · Bin Zhang · Lucy Liu · Hoi-To Wai · Liangliang Shi · Majid Abdolshah · Yoav Kolumbus · Lin Geng Foo · Junchi Yan · Mikhail Burtsev · Lianwen Jin · Yuan Zhan · Dung Nguyen · David Parkes · Yunpeng Baiia · Jun Liu · Kien Do · Guoliang Fan · Jeffrey S Rosenschein · Sunil Gupta · Sarah Keren · Svetha Venkatesh -
2022 Spotlight: Model-Based Offline Reinforcement Learning with Pessimism-Modulated Dynamics Belief »
Kaiyang Guo · Shao Yunfeng · Yanhui Geng -
2022 Spotlight: Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems »
Jia-Qi Yang · De-Chuan Zhan -
2022 Poster: Generalized Delayed Feedback Model with Post-Click Information in Recommender Systems »
Jia-Qi Yang · De-Chuan Zhan -
2021 Poster: Towards Enabling Meta-Learning from Target Models »
Su Lu · Han-Jia Ye · Le Gan · De-Chuan Zhan -
2016 Poster: What Makes Objects Similar: A Unified Multi-Metric Learning Approach »
Han-Jia Ye · De-Chuan Zhan · Xue-Min Si · Yuan Jiang · Zhi-Hua Zhou