Skip to yearly menu bar Skip to main content


Poster

In-Context Learning with Representations: Contextual Generalization of Trained Transformers

Tong Yang · Yu Huang · Yingbin Liang · Yuejie Chi

West Ballroom A-D #7200
[ ]
Thu 12 Dec 4:30 p.m. PST — 7:30 p.m. PST

Abstract: In-context learning (ICL) refers to a remarkable capability of pre-trained large language models, which can learn a new task given a few examples during inference. However, theoretical understanding of in-context learning is largely under-explored, particularly whether transformers can be trained to generalize to unseen examples in a prompt, which will require the model to acquire the contextual knowledge of the prompt for generalization. This paper investigates the training dynamics of transformers by gradient descent through the lens of non-linear regression tasks. The contextual generalization here can be attained via the in-context learning of the template function for each task, where all template functions lie in a linear space with $m$ basis functions. We analyze the training dynamics of multi-head transformers to {in-contextly} predict unlabeled inputs given partially labeled prompts where the labels contain Gaussian noise and there may be only a few examples in each prompt which are not sufficient to determine the template. We show that the training loss for a shallow multi-head transformer converge linearly to a global minimum. Moreover, the transformer effectively learns to perform ridge regression. To our knowledge, this study is the first of showing that transformers can learn contextual (i.e., template) information to generalize to unseen examples when prompts contain only a small number of query-answer pairs.

Live content is unavailable. Log in and register to view live content