Timezone: »
Poster
Memory-Efficient Fine-Tuning of Compressed Large Language Models via sub-4-bit Integer Quantization
Jeonghoon Kim · Jung Hyun Lee · Sungdong Kim · Joonsuk Park · Kang Min Yoo · Se Jung Kwon · Dongsoo Lee
Large language models (LLMs) face the challenges in fine-tuning and deployment due to their high memory demands and computational costs. While parameter-efficient fine-tuning (PEFT) methods aim to reduce the memory usage of the optimizer state during fine-tuning, the inherent size of pre-trained LLM weights continues to be a pressing concern. Even though quantization techniques are widely proposed to ease memory demands and accelerate LLM inference, most of these techniques are geared towards the deployment phase.To bridge this gap, this paper presents Parameter-Efficient and Quantization-aware Adaptation (PEQA) – a simple yet effective method that combines the advantages of PEFT with quantized LLMs. By updating solely the quantization scales, PEQA can be directly applied to quantized LLMs, ensuring seamless task transitions. Parallel to existing PEFT methods, PEQA significantly reduces the memory overhead associated with the optimizer state. Furthermore, it leverages the advantages of quantization to substantially reduce model sizes. Even after fine-tuning, the quantization structure of a PEQA-tuned LLM remains intact, allowing for accelerated inference on the deployment stage.We employ PEQA-tuning for task-specific adaptation on LLMs with up to $65$ billion parameters. To assess the logical reasoning and language comprehension of PEQA-tuned LLMs, we fine-tune low-bit quantized LLMs using a instruction dataset. Our results show that even when LLMs are quantized to below 4-bit precision, their capabilities in language modeling, few-shot in-context learning, and comprehension can be resiliently restored to (or even improved over) their full-precision original performances with PEQA.
Author Information
Jeonghoon Kim (NAVER CLOUD)
Jung Hyun Lee (Korea Advanced Institute of Science and Technology)
Sungdong Kim (NAVER)
Joonsuk Park (University of Richmond)
Kang Min Yoo (NAVER)
Se Jung Kwon (Samsung Research)
Dongsoo Lee (Samsung Research)
More from the Same Authors
-
2021 : KLUE: Korean Language Understanding Evaluation »
Sungjoon Park · Jihyung Moon · Sungdong Kim · Won Ik Cho · Ji Yoon Han · Jangwon Park · Chisung Song · Junseong Kim · Youngsook Song · Taehwan Oh · Joohong Lee · Juhyun Oh · Sungwon Lyu · Younghoon Jeong · Inkwon Lee · Sangwoo Seo · Dongjun Lee · Hyunwoo Kim · Myeonghwa Lee · Seongbo Jang · Seungwon Do · Sunkyoung Kim · Kyungtae Lim · Jongwon Lee · Kyumin Park · Jamin Shin · Seonghyun Kim · Lucy Park · Alice Oh · Jung-Woo Ha · Kyunghyun Cho -
2023 : Prometheus: Inducing Evaluation Capability in Language Models »
Seungone Kim · Jamin Shin · Yejin Cho · Joel Jang · Shayne Longpre · Hwaran Lee · Sangdoo Yun · Seongjin Shin · Sungdong Kim · James Thorne · Minjoon Seo -
2023 : FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets »
Seonghyeon Ye · Doyoung Kim · Sungdong Kim · Hyeonbin Hwang · Seungone Kim · Yongrae Jo · James Thorne · Juho Kim · Minjoon Seo -
2022 Poster: Mutual Information Divergence: A Unified Metric for Multimodal Generative Models »
Jin-Hwa Kim · Yunji Kim · Jiyoung Lee · Kang Min Yoo · Sang-Woo Lee -
2022 Poster: Maximum Likelihood Training of Implicit Nonlinear Diffusion Model »
Dongjun Kim · Byeonghu Na · Se Jung Kwon · Dongsoo Lee · Wanmo Kang · Il-chul Moon -
2020 Poster: FleXOR: Trainable Fractional Quantization »
Dongsoo Lee · Se Jung Kwon · Byeongwook Kim · Yongkweon Jeon · Baeseong Park · Jeongin Yun