Timezone: »
Neural sequence models based on the transformer architecture have demonstrated remarkable \emph{in-context learning} (ICL) abilities, where they can perform new tasks when prompted with training and test examples, without any parameter update to the model. This work first provides a comprehensive statistical theory for transformers to perform ICL. Concretely, we show that transformers can implement a broad class of standard machine learning algorithms in context, such as least squares, ridge regression, Lasso, learning generalized linear models, and gradient descent on two-layer neural networks, with near-optimal predictive power on various in-context data distributions. Using an efficient implementation of in-context gradient descent as the underlying mechanism, our transformer constructions admit mild size bounds, and can be learned with polynomially many pretraining sequences. Building on these ``base'' ICL algorithms, intriguingly, we show that transformers can implement more complex ICL procedures involving \emph{in-context algorithm selection}, akin to what a statistician can do in real life---A \emph{single} transformer can adaptively select different base ICL algorithms---or even perform qualitatively different tasks---on different input sequences, without any explicit prompting of the right algorithm or task. We both establish this in theory by explicit constructions, and also observe this phenomenon experimentally. In theory, we construct two general mechanisms for algorithm selection with concrete examples: pre-ICL testing, and post-ICL validation. As an example, we use the post-ICL validation mechanism to construct a transformer that can perform nearly Bayes-optimal ICL on a challenging task---noisy linear models with mixed noise levels. Experimentally, we demonstrate the strong in-context algorithm selection capabilities of standard transformer architectures.
Author Information
Yu Bai (Salesforce Research)
Fan Chen (MIT)
Huan Wang (Salesforce Research)
Caiming Xiong (Salesforce Research)
Song Mei (University of California, Berkeley)
Related Events (a corresponding poster, oral, or spotlight)
-
2023 Poster: Transformers as Statisticians: Provable In-Context Learning with In-Context Algorithm Selection »
Wed. Dec 13th through Thu the 14th Room Great Hall & Hall B1+B2 #526
More from the Same Authors
-
2021 Spotlight: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2022 : Fantastic Rewards and How to Tame Them: A Case Study on Reward Learning for Task-Oriented Dialogue Systems »
Yihao Feng · Shentao Yang · Shujian Zhang · Jianguo Zhang · Caiming Xiong · Mingyuan Zhou · Huan Wang -
2023 : Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models »
Song Mei · Yuchen Wu -
2023 : Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining »
Licong Lin · Yu Bai · Song Mei -
2023 : How Do Transformers Learn In-Context Beyond Simple Functions? A Case Study on Learning with Representations »
Tianyu Guo · Wei Hu · Song Mei · Huan Wang · Caiming Xiong · Silvio Savarese · Yu Bai -
2023 : Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models »
Song Mei · Yuchen Wu -
2023 : Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining »
Licong Lin · Yu Bai · Song Mei -
2023 : Transformers as Decision Makers: Provable In-Context Reinforcement Learning via Supervised Pretraining »
Licong Lin · Yu Bai · Song Mei -
2023 : Deep Networks as Denoising Algorithms: Sample-Efficient Learning of Diffusion Models in High-Dimensional Graphical Models »
Song Mei · Yuchen Wu -
2023 Poster: What can a Single Attention Layer Learn? A Study Through the Random Features Lens »
Hengyu Fu · Tianyu Guo · Yu Bai · Song Mei -
2023 Poster: UniControl: A Unified Diffusion Model for Controllable Visual Generation In the Wild »
Can Qin · Shu Zhang · Ning Yu · Yihao Feng · Xinyi Yang · Yingbo Zhou · Huan Wang · Juan Carlos Niebles · Caiming Xiong · Silvio Savarese · Stefano Ermon · Yun Fu · Ran Xu -
2023 Poster: Efficient RL with Impaired Observability: Learning to Act with Delayed and Missing State Observations »
Minshuo Chen · Yu Bai · H. Vincent Poor · Mengdi Wang -
2023 Poster: Preference-grounded Token-level Guidance for Language Model Fine-tuning »
Shentao Yang · Shujian Zhang · Congying Xia · Yihao Feng · Caiming Xiong · Mingyuan Zhou -
2022 Spotlight: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong -
2022 Spotlight: Lightning Talks 5B-1 »
Devansh Arpit · Xiaojun Xu · Zifan Shi · Ivan Skorokhodov · Shayan Shekarforoush · Zhan Tong · Yiqun Wang · Shichong Peng · Linyi Li · Ivan Skorokhodov · Huan Wang · Yibing Song · David Lindell · Yinghao Xu · Seyed Alireza Moazenipourasil · Sergey Tulyakov · Peter Wonka · Yiqun Wang · Ke Li · David Fleet · Yujun Shen · Yingbo Zhou · Bo Li · Jue Wang · Peter Wonka · Marcus Brubaker · Caiming Xiong · Limin Wang · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 : Near-Negative Distinction: Giving a Second Life to Human Evaluation Datasets »
Philippe Laban · Chien-Sheng Wu · Wenhao Liu · Caiming Xiong -
2022 Poster: Identifying good directions to escape the NTK regime and efficiently learn low-degree plus sparse polynomials »
Eshaan Nichani · Yu Bai · Jason Lee -
2022 Poster: Efficient Phi-Regret Minimization in Extensive-Form Games via Online Mirror Descent »
Yu Bai · Chi Jin · Song Mei · Ziang Song · Tiancheng Yu -
2022 Poster: A Near-Optimal Primal-Dual Method for Off-Policy Learning in CMDP »
Fan Chen · Junyu Zhang · Zaiwen Wen -
2022 Poster: Policy Optimization for Markov Games: Unified Framework and Faster Convergence »
Runyu Zhang · Qinghua Liu · Huan Wang · Caiming Xiong · Na Li · Yu Bai -
2022 Poster: Learning with convolution and pooling operations in kernel methods »
Theodor Misiakiewicz · Song Mei -
2022 Poster: Ensemble of Averages: Improving Model Selection and Boosting Performance in Domain Generalization »
Devansh Arpit · Huan Wang · Yingbo Zhou · Caiming Xiong -
2022 Poster: Sample-Efficient Learning of Correlated Equilibria in Extensive-Form Games »
Ziang Song · Song Mei · Yu Bai -
2021 Poster: Sample-Efficient Learning of Stackelberg Equilibria in General-Sum Games »
Yu Bai · Chi Jin · Huan Wang · Caiming Xiong -
2021 Poster: Understanding the Under-Coverage Bias in Uncertainty Estimation »
Yu Bai · Song Mei · Huan Wang · Caiming Xiong -
2021 Poster: Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning »
Tengyang Xie · Nan Jiang · Huan Wang · Caiming Xiong · Yu Bai -
2021 Poster: Near-Optimal Offline Reinforcement Learning via Double Variance Reduction »
Ming Yin · Yu Bai · Yu-Xiang Wang -
2019 Poster: Provably Efficient Q-Learning with Low Switching Cost »
Yu Bai · Tengyang Xie · Nan Jiang · Yu-Xiang Wang