Timezone: »
We demonstrate that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even becoming competitive with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks. We also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora.
Author Information
Tom B Brown (Google Brain)
Research Engineer @ Google Brain
Benjamin Mann (OpenAI)
Nick Ryder (OpenAI)
Melanie Subbiah (OpenAI)
Jared Kaplan (Johns Hopkins University)
Prafulla Dhariwal (OpenAI)
Arvind Neelakantan (OpenAI)
Pranav Shyam (OpenAI)
Girish Sastry (OpenAI)
Amanda Askell (OpenAI)
Sandhini Agarwal (OpenAI)
Ariel Herbert-Voss (OpenAI)
Gretchen M Krueger (OpenAI)
Tom Henighan (OpenAI)
Rewon Child (OpenAI)
Aditya Ramesh (OpenAI)
Daniel Ziegler (OpenAI)
I work at OpenAI on AI alignment: how can we make techniques for learning human values that will scale robustly to superhuman learning systems and task performance?
Jeffrey Wu (OpenAI)
Clemens Winter (OpenAI)
Chris Hesse (OpenAI)
Mark Chen (OpenAI)
Eric Sigler (OpenAI)
Mateusz Litwin (OpenAI)
Scott Gray (OpenAI)
Benjamin Chess (OpenAI)
Jack Clark (OpenAI)
Christopher Berner (OpenAI)
Sam McCandlish (OpenAI)
Alec Radford (OpenAI)
Ilya Sutskever (OpenAI)
Dario Amodei (OpenAI)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Language Models are Few-Shot Learners »
Tue. Dec 8th 05:00 -- 07:00 AM Room Poster Session 0 #49
More from the Same Authors
-
2021 Spotlight: Diffusion Models Beat GANs on Image Synthesis »
Prafulla Dhariwal · Alexander Nichol -
2022 Poster: Training language models to follow instructions with human feedback »
Long Ouyang · Jeffrey Wu · Xu Jiang · Diogo Almeida · Carroll Wainwright · Pamela Mishkin · Chong Zhang · Sandhini Agarwal · Katarina Slama · Alex Ray · John Schulman · Jacob Hilton · Fraser Kelton · Luke Miller · Maddie Simens · Amanda Askell · Peter Welinder · Paul Christiano · Jan Leike · Ryan Lowe -
2021 Poster: Diffusion Models Beat GANs on Image Synthesis »
Prafulla Dhariwal · Alexander Nichol -
2021 Panel: The Consequences of Massive Scaling in Machine Learning »
Noah Goodman · Melanie Mitchell · Joelle Pineau · Oriol Vinyals · Jared Kaplan -
2020 Workshop: Cooperative AI »
Thore Graepel · Dario Amodei · Vincent Conitzer · Allan Dafoe · Gillian Hadfield · Eric Horvitz · Sarit Kraus · Kate Larson · Yoram Bachrach -
2020 Poster: Learning to summarize with human feedback »
Nisan Stiennon · Long Ouyang · Jeffrey Wu · Daniel Ziegler · Ryan Lowe · Chelsea Voss · Alec Radford · Dario Amodei · Paul Christiano -
2019 : Transfer Learning for Text Generation »
Alec Radford -
2019 Workshop: Joint Workshop on AI for Social Good »
Fei Fang · Joseph Aylett-Bullock · Marc-Antoine Dilhac · Brian Green · natalie saltiel · Dhaval Adjodah · Jack Clark · Sean McGregor · Margaux Luck · Jonathan Penn · Tristan Sylvain · Geneviève Boucher · Sydney Swaine-Simon · Girmaw Abebe Tadesse · Myriam Côté · Anna Bethke · Yoshua Bengio -
2018 Poster: Reward learning from human preferences and demonstrations in Atari »
Borja Ibarz · Jan Leike · Tobias Pohlen · Geoffrey Irving · Shane Legg · Dario Amodei -
2018 Poster: The Importance of Sampling inMeta-Reinforcement Learning »
Bradly Stadie · Ge Yang · Rein Houthooft · Peter Chen · Yan Duan · Yuhuai Wu · Pieter Abbeel · Ilya Sutskever -
2018 Poster: Glow: Generative Flow with Invertible 1x1 Convolutions »
Diederik Kingma · Prafulla Dhariwal -
2017 : Small World Network Architectures »
Scott Gray -
2017 : Future Hardware Directions »
Gregory Diamos · Jeff Dean · Simon Knowles · Michael James · Scott Gray -
2017 : Invited talk 6 »
Dario Amodei -
2017 Poster: Deep Reinforcement Learning from Human Preferences »
Paul Christiano · Jan Leike · Tom Brown · Miljan Martic · Shane Legg · Dario Amodei -
2017 Poster: One-Shot Imitation Learning »
Yan Duan · Marcin Andrychowicz · Bradly Stadie · OpenAI Jonathan Ho · Jonas Schneider · Ilya Sutskever · Pieter Abbeel · Wojciech Zaremba -
2016 : Welcome »
David Lopez-Paz · Alec Radford · Leon Bottou -
2016 Workshop: Adversarial Training »
David Lopez-Paz · Leon Bottou · Alec Radford -
2016 Poster: Improved Techniques for Training GANs »
Tim Salimans · Ian Goodfellow · Wojciech Zaremba · Vicki Cheung · Alec Radford · Peter Chen · Xi Chen -
2012 Poster: ImageNet Classification with Deep Convolutional Neural Networks »
Alex Krizhevsky · Ilya Sutskever · Geoffrey E Hinton -
2012 Spotlight: ImageNet Classification with Deep Convolutional Neural Networks »
Alex Krizhevsky · Ilya Sutskever · Geoffrey E Hinton -
2012 Poster: Cardinality Restricted Boltzmann Machines »
Kevin Swersky · Danny Tarlow · Ilya Sutskever · Richard Zemel · Russ Salakhutdinov · Ryan Adams -
2009 Poster: Modelling Relational Data using Bayesian Clustered Tensor Factorization »
Ilya Sutskever · Russ Salakhutdinov · Josh Tenenbaum -
2008 Poster: Using matrices to model symbolic relationship »
Ilya Sutskever · Geoffrey E Hinton -
2008 Spotlight: Using matrices to model symbolic relationship »
Ilya Sutskever · Geoffrey E Hinton -
2008 Poster: The Recurrent Temporal Restricted Boltzmann Machine »
Ilya Sutskever · Geoffrey E Hinton · Graham Taylor