Timezone: »
Knowledge distillation (KD) is one of the prominent techniques for model compression. Although conventional KD is effective for matching the two networks over the given data points, there is no guarantee that these models would match in other areas for which we do not have enough training samples. In this work, we address this problem by generating new auxiliary training samples based on extracting knowledge from the backward pass and identifying the areas where the student diverges greatly from the teacher. This is done by perturbing data samples in the direction of the gradient of the difference between the student and the teacher. We studied the effect of the proposed method on various tasks in different domains, including images and NLP tasks with considerably smaller student networks. Our experiments, show the proposed method got superior results over other baselines.
Author Information
Aref Jafari (University of Waterloo)
Mehdi Rezaghoizadeh (Huawei Technologies)
Ali Ghodsi (University of Waterloo)
More from the Same Authors
-
2021 : Compressing Pre-trained Language Models using Progressive Low Rank Decomposition »
Habib Hajimolahoseini · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Marzieh Tahaei · Omar Mohamed Awad · Yang Liu -
2021 : Kronecker Decomposition for GPT Compression »
Ali Edalati · Marzieh Tahaei · Ahmad Rashid · Vahid Partovi Nia · James J. Clark · Mehdi Rezaghoizadeh -
2022 : Strategies for Applying Low Rank Decomposition to Transformer-Based Models »
Habib Hajimolahoseini · Walid Ahmed · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Yang Liu -
2022 : DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low Rank Adaptation »
Mojtaba Valipour · Mehdi Rezaghoizadeh · Ivan Kobyzev · Ali Ghodsi -
2022 : SymbolicGPT: A Generative Transformer Model for Symbolic Regression »
Mojtaba Valipour · Bowen You · Maysum H Panju · Ali Ghodsi -
2022 : Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement »
Heitor GuimarĂ£es · Arthur Pimentel · Anderson R. Avila · Mehdi Rezaghoizadeh · Tiago H Falk -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2022 : Improving the Robustness of DistilHuBERT to Unseen Noisy Conditions via Data Augmentation, Curriculum Learning, and Multi-Task Enhancement »
Heitor GuimarĂ£es · Arthur Pimentel · Anderson R. Avila · Mehdi Rezaghoizadeh · Tiago H Falk -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2021 : Best Papers and Closing Remarks »
Ali Ghodsi · Pascal Poupart -
2021 : Panel Discussion »
Pascal Poupart · Ali Ghodsi · Luke Zettlemoyer · Sameer Singh · Kevin Duh · Yejin Choi · Lu Hou -
2021 : Opening of the Afternoon Session »
Ali Ghodsi -
2021 Workshop: Efficient Natural Language and Speech Processing (Models, Training, and Inference) »
Mehdi Rezaghoizadeh · Lili Mou · Yue Dong · Pascal Poupart · Ali Ghodsi · Qun Liu