Timezone: »
Knowledge distillation (KD) is a general neural network training approach that uses a teacher model to guide the student model. Existing works mainly study KD from the network output side (e.g., trying to design a better KD loss function), while few have attempted to understand it from the input side. Especially, its interplay with data augmentation (DA) has not been well understood. In this paper, we ask: Why do some DA schemes (e.g., CutMix) inherently perform much better than others in KD? What makes a "good" DA in KD? Our investigation from a statistical perspective suggests that a good DA scheme should reduce the covariance of the teacher-student cross-entropy. A practical metric, the stddev of teacher’s mean probability (T. stddev), is further presented and well justified empirically. Besides the theoretical understanding, we also introduce a new entropy-based data-mixing DA scheme, CutMixPick, to further enhance CutMix. Extensive empirical studies support our claims and demonstrate how we can harvest considerable performance gains simply by using a better DA scheme in knowledge distillation. Code: https://github.com/MingSun-Tse/Good-DA-in-KD.
Author Information
Huan Wang
Suhas Lohit (Mitsubishi Electric Research Labs)
Michael Jones (MERL)
Yun Fu (Northeastern University)
More from the Same Authors
-
2021 Spotlight: Aligned Structured Sparsity Learning for Efficient Image Super-Resolution »
Yulun Zhang · Huan Wang · Can Qin · Yun Fu -
2023 Poster: UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild »
Can Qin · Shu Zhang · Ning Yu · Yihao Feng · Xinyi Yang · Yingbo Zhou · Huan Wang · Juan Carlos Niebles · Caiming Xiong · Silvio Savarese · Stefano Ermon · Yun Fu · Ran Xu -
2023 Poster: Text-to-Image Diffusion Model on Mobile Devices within Two Seconds »
Yanyu Li · Huan Wang · Qing Jin · Ju Hu · Pavlo Chemerys · Yun Fu · Yanzhi Wang · Sergey Tulyakov · Jian Ren -
2023 Poster: Latent Graph Inference with Limited Supervision »
Jianglin Lu · Yi Xu · Huan Wang · Yue Bai · Yun Fu -
2023 Poster: Exploring Question Decomposition for Zero-Shot VQA »
Zaid Khan · Vijay Kumar B G · Samuel Schulter · Manmohan Chandraker · Yun Fu -
2022 Poster: Learning Partial Equivariances From Data »
David W. Romero · Suhas Lohit -
2022 Poster: Look More but Care Less in Video Recognition »
Yitian Zhang · Yue Bai · Huan Wang · Yi Xu · Yun Fu -
2022 Poster: Parameter-Efficient Masking Networks »
Yue Bai · Huan Wang · Xu Ma · Yitian Zhang · Zhiqiang Tao · Yun Fu -
2021 Poster: Slow Learning and Fast Inference: Efficient Graph Similarity Computation via Knowledge Distillation »
Can Qin · Handong Zhao · Lichen Wang · Huan Wang · Yulun Zhang · Yun Fu -
2021 Poster: Aligned Structured Sparsity Learning for Efficient Image Super-Resolution »
Yulun Zhang · Huan Wang · Can Qin · Yun Fu -
2020 Poster: Learning to Mutate with Hypergradient Guided Population »
Zhiqiang Tao · Yaliang Li · Bolin Ding · Ce Zhang · Jingren Zhou · Yun Fu -
2020 Poster: Neural Sparse Representation for Image Restoration »
Yuchen Fan · Jiahui Yu · Yiqun Mei · Yulun Zhang · Yun Fu · Ding Liu · Thomas Huang -
2019 Poster: PointDAN: A Multi-Scale 3D Domain Adaption Network for Point Cloud Representation »
Can Qin · Haoxuan You · Lichen Wang · C.-C. Jay Kuo · Yun Fu -
2017 Poster: Matching on Balanced Nonlinear Representations for Treatment Effects Estimation »
Sheng Li · Yun Fu -
2012 Poster: Fast Resampling Weighted v-Statistics »
Chunxiao Zhou · jiseong Park · Yun Fu -
2012 Spotlight: Fast Resampling Weighted v-Statistics »
Chunxiao Zhou · jiseong Park · Yun Fu