Timezone: »

PEST: Combining Parameter-Efficient Fine-Tuning with Self-Training and Co-Training
Hunter Lang · Monica Agrawal · Yoon Kim · David Sontag

We demonstrate how to improve the zero-shot and few-shot performance of large language models (LLMs) by using the T-Few parameter-efficient fine-tuning method (Liu et al., 2022) with self-training or co-training. Our methods apply to settings where labeled data is very limited, but unlabeled data is plentiful. Specifically, we combine T-Few with (i) the co-training techniques of Lang et al. (2022a), and (ii) SETRED, a self-training algorithm that uses a very simple data selection criterion (Li and Zhou, 2005). By using the efficient T-Few method, we are able to scale co-training to larger models (from T0-3B to T0-11B) and cut down on wallclock training time, improving the zero-shot co-training results of Lang et al. 2022a). By performing multiple iterations of self- or co-training, we significantly improve over the few-shot performance of T-Few reported by Liu et al. (2022) without using any additional labeled data. Our methods are relatively fast (2.5 hours to self-train T0-11B on a single A100 80GB) and allow T0-11B to match the few-shot performance of models with an order of magnitude more parameters.

Author Information

Hunter Lang (MIT)
Monica Agrawal (Massachusetts Institute of Technology)
Yoon Kim (Massachusetts Institute of Technology)
David Sontag (MIT)

More from the Same Authors