Workshop: Second Workshop on Efficient Natural Language and Speech Processing (ENLSP-II)

Improved Knowledge Distillation by Utilizing Backward Pass Knowledge in Neural Networks

Aref Jafari · Mehdi Rezaghoizadeh · Ali Ghodsi

Keywords: [ ENLSP-Main ]


Knowledge distillation (KD) is one of the prominent techniques for model compression. Although conventional KD is effective for matching the two networks over the given data points, there is no guarantee that these models would match in other areas for which we do not have enough training samples. In this work, we address this problem by generating new auxiliary training samples based on extracting knowledge from the backward pass and identifying the areas where the student diverges greatly from the teacher. This is done by perturbing data samples in the direction of the gradient of the difference between the student and the teacher. We studied the effect of the proposed method on various tasks in different domains, including images and NLP tasks with considerably smaller student networks. Our experiments, show the proposed method got superior results over other baselines.

Chat is not available.