Timezone: »
Pretrained language models have achieved state-of-the-art performance when adapted to a downstream NLP task. However, theoretical analysis of these models is scarce and challenging since the pretraining and downstream tasks can be very different. We propose an analysis framework that links the pretraining and downstream tasks with an underlying latent variable generative model of text -- the downstream classifier must recover a function of the posterior distribution over the latent variables. We analyze head tuning (learning a classifier on top of the frozen pretrained model) and prompt tuning in this setting. The generative model in our analysis is either a Hidden Markov Model (HMM) or an HMM augmented with a latent memory component, motivated by long-term dependencies in natural language. We show that 1) under certain non-degeneracy conditions on the HMM, simple classification heads can solve the downstream task, 2) prompt tuning obtains downstream guarantees with weaker non-degeneracy conditions, and 3) our recovery guarantees for the memory-augmented HMM are stronger than for the vanilla HMM because task-relevant information is easier to recover from the long-term memory. Experiments on synthetically generated data from HMMs back our theoretical findings.
Author Information
Colin Wei (Stanford University)
Sang Michael Xie (Stanford University)
Tengyu Ma (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Poster: Why Do Pretrained Language Models Help in Downstream Tasks? An Analysis of Head and Prompt Tuning »
Thu. Dec 9th 12:30 -- 02:00 AM Room
More from the Same Authors
-
2021 : Ensembles and Cocktails: Robust Finetuning for Natural Language Generation »
John Hewitt · Xiang Li · Sang Michael Xie · Benjamin Newman · Percy Liang -
2021 : Calibrated Ensembles: A Simple Way to Mitigate ID-OOD Accuracy Tradeoffs »
Ananya Kumar · Aditi Raghunathan · Tengyu Ma · Percy Liang -
2021 : How Does Contrastive Pre-training Connect Disparate Domains? »
Kendrick Shen · Robert Jones · Ananya Kumar · Sang Michael Xie · Percy Liang -
2021 : Extending the WILDS Benchmark for Unsupervised Adaptation »
Shiori Sagawa · Pang Wei Koh · Tony Lee · Irena Gao · Sang Michael Xie · Kendrick Shen · Ananya Kumar · Weihua Hu · Michihiro Yasunaga · Henrik Marklund · Sara Beery · Ian Stavness · Jure Leskovec · Kate Saenko · Tatsunori Hashimoto · Sergey Levine · Chelsea Finn · Percy Liang -
2021 : Self-supervised Learning is More Robust to Dataset Imbalance »
Hong Liu · Jeff Z. HaoChen · Adrien Gaidon · Tengyu Ma -
2021 : Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2022 Poster: Statistically Meaningful Approximation: a Case Study on Approximating Turing Machines with Transformers »
Colin Wei · Yining Chen · Tengyu Ma -
2022 Poster: Beyond Separability: Analyzing the Linear Transferability of Contrastive Representations to Related Subpopulations »
Jeff Z. HaoChen · Colin Wei · Ananya Kumar · Tengyu Ma -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization Q&A »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 : DR3: Value-Based Deep Reinforcement Learning Requires Explicit Regularization »
Aviral Kumar · Rishabh Agarwal · Tengyu Ma · Aaron Courville · George Tucker · Sergey Levine -
2021 Poster: Label Noise SGD Provably Prefers Flat Global Minimizers »
Alex Damian · Tengyu Ma · Jason Lee -
2021 Poster: Learning Barrier Certificates: Towards Safe Reinforcement Learning with Zero Training-time Violations »
Yuping Luo · Tengyu Ma -
2021 Poster: Calibrating Predictions to Decisions: A Novel Approach to Multi-Class Calibration »
Shengjia Zhao · Michael Kim · Roshni Sahoo · Tengyu Ma · Stefano Ermon -
2021 Oral: Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss »
Jeff Z. HaoChen · Colin Wei · Adrien Gaidon · Tengyu Ma -
2021 Poster: Safe Reinforcement Learning by Imagining the Near Future »
Garrett Thomas · Yuping Luo · Tengyu Ma -
2021 Poster: Provable Guarantees for Self-Supervised Deep Learning with Spectral Contrastive Loss »
Jeff Z. HaoChen · Colin Wei · Adrien Gaidon · Tengyu Ma -
2021 Poster: Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature »
Kefan Dong · Jiaqi Yang · Tengyu Ma -
2020 Poster: Self-training Avoids Using Spurious Features Under Domain Shift »
Yining Chen · Colin Wei · Ananya Kumar · Tengyu Ma -
2019 : Tengyu Ma, "Designing Explicit Regularizers for Deep Models" »
Tengyu Ma -
2019 Poster: Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss »
Kaidi Cao · Colin Wei · Adrien Gaidon · Nikos Arechiga · Tengyu Ma -
2019 Poster: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel »
Colin Wei · Jason Lee · Qiang Liu · Tengyu Ma -
2019 Spotlight: Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel »
Colin Wei · Jason Lee · Qiang Liu · Tengyu Ma -
2019 Poster: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation »
Colin Wei · Tengyu Ma -
2019 Poster: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2019 Spotlight: Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation »
Colin Wei · Tengyu Ma -
2019 Spotlight: Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks »
Yuanzhi Li · Colin Wei · Tengyu Ma -
2018 : Poster Session »
Sujay Sanghavi · Vatsal Shah · Yanyao Shen · Tianchen Zhao · Yuandong Tian · Tomer Galanti · Mufan Li · Gilad Cohen · Daniel Rothchild · Aristide Baratin · Devansh Arpit · Vagelis Papalexakis · Michael Perlmutter · Ashok Vardhan Makkuva · Pim de Haan · Yingyan Lin · Wanmo Kang · Cheolhyoung Lee · Hao Shen · Sho Yaida · Dan Roberts · Nadav Cohen · Philippe Casgrain · Dejiao Zhang · Tengyu Ma · Avinash Ravichandran · Julian Emilio Salazar · Bo Li · Davis Liang · Christopher Wong · Glen Bigan Mbeng · Animesh Garg -
2017 Poster: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2017 Oral: On the Optimization Landscape of Tensor Decompositions »
Rong Ge · Tengyu Ma -
2016 Oral: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: Matrix Completion has No Spurious Local Minimum »
Rong Ge · Jason Lee · Tengyu Ma -
2016 Poster: A Non-generative Framework and Convex Relaxations for Unsupervised Learning »
Elad Hazan · Tengyu Ma -
2015 Poster: Sum-of-Squares Lower Bounds for Sparse PCA »
Tengyu Ma · Avi Wigderson -
2014 Poster: On Communication Cost of Distributed Statistical Estimation and Dimensionality »
Ankit Garg · Tengyu Ma · Huy Nguyen -
2014 Oral: On Communication Cost of Distributed Statistical Estimation and Dimensionality »
Ankit Garg · Tengyu Ma · Huy Nguyen