Timezone: »
Pre-trained Language Models (PLMs) have been successful for a wide range of natural language processing (NLP) tasks. The state-of-the-art (SOTA) of PLMs, however, are extremely large to be used on edge devices. As a result, the topic of model compression has attracted increasing attention in the NLP community. Most of the existing works focus on compressing encoder-based models (tiny-BERT, distilBERT, distilRoBERTa, etc), however, to the best of our knowledge, the compression of decoder-based models (such as GPT-2) has not been investigated much. Our paper aims to fill this gap. Specifically, we explore two directions: 1) We employ current SOTA knowledge distillation techniques to improve fine-tuning of DistilGPT-2. 2) We pre-train a compressed GPT-2 model using layer truncation and compare it against distillation-based methods. The training time of our compressed model is significantly less than DistilGPT-2, but it can achieve better performance when fine-tuned on downstream tasks. We also demonstrate the impact of data cleaning on model performance.
Author Information
Tianda Li (Noah's ark lab (Montreal))
Yassir El Mesbahi (Huawei)
Ivan Kobyzev (Huawei Noah's Ark Lab)
Ahmad Rashid (Huawei Technologies)
Atif Mahmud (Huawei Noah's Ark Lab)
Nithin Anchuri (Huawei Noah's Ark Lab)
Habib Hajimolahoseini (Huawei Toronto Research Centre)
Yang Liu (Huawei Canada)
Mehdi Rezagholizadeh (Huawei Technologies)
More from the Same Authors
-
2021 : NATURE: Natural Auxiliary Text Utterances for Realistic Spoken Language Evaluation »
David Alfonso-Hermelo · Ahmad Rashid · Abbas Ghaddar · Philippe Langlais · Mehdi Rezagholizadeh -
2021 : Compressing Pre-trained Language Models using Progressive Low Rank Decomposition »
Habib Hajimolahoseini · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Marzieh Tahaei · Omar Mohamed Awad · Yang Liu -
2021 : Kronecker Decomposition for GPT Compression »
Ali Edalati · Marzieh Tahaei · Ahmad Rashid · Vahid Partovi Nia · James J. Clark · Mehdi Rezaghoizadeh -
2022 : Strategies for Applying Low Rank Decomposition to Transformer-Based Models »
Habib Hajimolahoseini · Walid Ahmed · Mehdi Rezaghoizadeh · Vahid Partovi Nia · Yang Liu -
2022 : DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low Rank Adaptation »
Mojtaba Valipour · Mehdi Rezaghoizadeh · Ivan Kobyzev · Ali Ghodsi -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2023 : GQKVA: Efficient Pre-training of Transformers by Grouping Queries, Keys, and Values »
Farnoosh Javadi · Walid Ahmed · Habib Hajimolahoseini · Foozhan Ataiefard · Mohammad Hassanpour · Saina Asani · Austin Wen · Omar Mohamed Awad · Kangling Liu · Yang Liu -
2023 : KronA: Parameter Efficient Tuning with Kronecker Adapter »
Ali Edalati · Marzieh Tahaei · Ivan Kobyzev · Vahid Partovi Nia · James J. Clark · Mehdi Rezaghoizadeh -
2023 : SwiftLearn: A Data-Efficient Training Method of Deep Learning Models using Importance Sampling »
Habib Hajimolahoseini · Omar Mohamed Awad · Walid Ahmed · Austin Wen · Saina Asani · Mohammad Hassanpour · Farnoosh Javadi · Mehdi Ahmadi · Foozhan Ataiefard · Kangling Liu · Yang Liu -
2023 Workshop: Third Workshop on Efficient Natural Language and Speech Processing (ENLSP-III): Towards the Future of Large Language Models and their Emerging Descendants »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Yu Cheng · Soheila Samiee · Lili Mou · Qun Liu · Boxing Chen -
2023 : Opening Speech »
Mehdi Rezagholizadeh -
2022 : Attribute Controlled Dialogue Prompting »
Runcheng Liu · Ahmad Rashid · Ivan Kobyzev · Mehdi Rezaghoizadeh · Pascal Poupart -
2022 Workshop: Second Workshop on Efficient Natural Language and Speech Processing (ENLSP-II) »
Mehdi Rezagholizadeh · Peyman Passban · Yue Dong · Lili Mou · Pascal Poupart · Ali Ghodsi · Qun Liu