Timezone: »
We explore how generating a chain of thought---a series of intermediate reasoning steps---significantly improves the ability of large language models to perform complex reasoning. In particular, we show how such reasoning abilities emerge naturally in sufficiently large language models via a simple method called chain of thought prompting, where a few chain of thought demonstrations are provided as exemplars in prompting. Experiments on three large language models show that chain of thought prompting improves performance on a range of arithmetic, commonsense, and symbolic reasoning tasks. The empirical gains can be striking. For instance, prompting a 540B-parameter language model with just eight chain of thought exemplars achieves state of the art accuracy on the GSM8K benchmark of math word problems, surpassing even finetuned GPT-3 with a verifier.
Author Information
Jason Wei (Google Brain)
Some dude
Xuezhi Wang (Google)
Dale Schuurmans (Google Brain & University of Alberta)
Maarten Bosma (Google)
brian ichter (Google)
Fei Xia (Google)
Ed Chi (Google Inc.)
d H. Chi is a Principal Scientist at Google, leading several machine learning research teams focusing on neural modeling, inclusive ML, reinforcement learning, and recommendation systems in Google Brain team. He has delivered significant improvements for YouTube, News, Ads, Google Play Store at Google with >325 product launches in the last 6 years. With 39 patents and over 120 research articles, he is also known for research on user behavior in web and social media. Prior to Google, he was the Area Manager and a Principal Scientist at Palo Alto Research Center's Augmented Social Cognition Group, where he led the team in understanding how social systems help groups of people to remember, think and reason. Ed completed his three degrees (B.S., M.S., and Ph.D.) in 6.5 years from University of Minnesota. Recognized as an ACM Distinguished Scientist and elected into the CHI Academy, he recently received a 20-year Test of Time award for research in information visualization. He has been featured and quoted in the press, including the Economist, Time Magazine, LA Times, and the Associated Press. An avid swimmer, photographer and snowboarder in his spare time, he also has a blackbelt in Taekwondo.
Quoc V Le (Google)
Denny Zhou (Google)
More from the Same Authors
-
2021 Spotlight: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 : Understanding and Improving Robustness of VisionTransformers through patch-based NegativeAugmentation »
Yao Qin · Chiyuan Zhang · Ting Chen · Balaji Lakshminarayanan · Alex Beutel · Xuezhi Wang -
2021 : Value Function Spaces: Skill-Centric State Abstractions for Long-Horizon Reasoning »
Dhruv Shah · Ted Xiao · Alexander Toshev · Sergey Levine · brian ichter -
2021 : Offline Policy Selection under Uncertainty »
Mengjiao (Sherry) Yang · Bo Dai · Ofir Nachum · George Tucker · Dale Schuurmans -
2022 : Towards Companion Recommendation Systems »
Konstantina Christakopoulou · Yuyan Wang · Ed Chi · MINMIN CHEN -
2022 : A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations »
Sohan Rudra · Saksham Goel · Anirban Santara · Claudio Gentile · Laurent Perron · Fei Xia · Vikas Sindhwani · Carolina Parada · Gaurav Aggarwal -
2022 : A Contextual Bandit Approach for Learning to Plan in Environments with Probabilistic Goal Configurations »
Sohan Rudra · Saksham Goel · Anirban Santara · Claudio Gentile · Laurent Perron · Fei Xia · Vikas Sindhwani · Carolina Parada · Gaurav Aggarwal -
2022 Spotlight: Improving Multi-Task Generalization via Regularizing Spurious Correlation »
Ziniu Hu · Zhe Zhao · Xinyang Yi · Tiansheng Yao · Lichan Hong · Yizhou Sun · Ed Chi -
2022 Poster: Improving Multi-Task Generalization via Regularizing Spurious Correlation »
Ziniu Hu · Zhe Zhao · Xinyang Yi · Tiansheng Yao · Lichan Hong · Yizhou Sun · Ed Chi -
2022 Poster: A Simple Decentralized Cross-Entropy Method »
Zichen Zhang · Jun Jin · Martin Jagersand · Jun Luo · Dale Schuurmans -
2022 Poster: Chain of Thought Imitation with Procedure Cloning »
Mengjiao (Sherry) Yang · Dale Schuurmans · Pieter Abbeel · Ofir Nachum -
2022 Poster: Optimal Scaling for Locally Balanced Proposals in Discrete Spaces »
Haoran Sun · Hanjun Dai · Dale Schuurmans -
2022 Poster: Back Razor: Memory-Efficient Transfer Learning by Self-Sparsified Backpropagation »
Ziyu Jiang · Xuxi Chen · Xueqin Huang · Xianzhi Du · Denny Zhou · Zhangyang Wang -
2022 Poster: The Role of Baselines in Policy Gradient Optimization »
Jincheng Mei · Wesley Chung · Valentin Thomas · Bo Dai · Csaba Szepesvari · Dale Schuurmans -
2022 Poster: Mixture-of-Experts with Expert Choice Routing »
Yanqi Zhou · Tao Lei · Hanxiao Liu · Nan Du · Yanping Huang · Vincent Zhao · Andrew Dai · zhifeng Chen · Quoc V Le · James Laudon -
2022 Poster: Understanding and Improving Robustness of Vision Transformers through Patch-based Negative Augmentation »
Yao Qin · Chiyuan Zhang · Ting Chen · Balaji Lakshminarayanan · Alex Beutel · Xuezhi Wang -
2022 Poster: TabNAS: Rejection Sampling for Neural Architecture Search on Tabular Datasets »
Chengrun Yang · Gabriel Bender · Hanxiao Liu · Pieter-Jan Kindermans · Madeleine Udell · Yifeng Lu · Quoc V Le · Da Huang -
2022 Poster: On the Global Convergence Rates of Decentralized Softmax Gradient Play in Markov Potential Games »
Runyu Zhang · Jincheng Mei · Bo Dai · Dale Schuurmans · Na Li -
2021 : Dale Schuurmans Talk Q&A »
Dale Schuurmans -
2021 : Invited Talk: Dale Schuurmans - Understanding Deep Value Estimation »
Dale Schuurmans -
2021 Poster: DSelect-k: Differentiable Selection in the Mixture of Experts with Applications to Multi-Task Learning »
Hussein Hazimeh · Zhe Zhao · Aakanksha Chowdhery · Maheswaran Sathiamoorthy · Yihua Chen · Rahul Mazumder · Lichan Hong · Ed Chi -
2021 Poster: Combiner: Full Attention Transformer with Sparse Computation Cost »
Hongyu Ren · Hanjun Dai · Zihang Dai · Mengjiao (Sherry) Yang · Jure Leskovec · Dale Schuurmans · Bo Dai -
2021 Poster: CoAtNet: Marrying Convolution and Attention for All Data Sizes »
Zihang Dai · Hanxiao Liu · Quoc V Le · Mingxing Tan -
2021 Poster: Searching for Efficient Transformers for Language Modeling »
David So · Wojciech Mańke · Hanxiao Liu · Zihang Dai · Noam Shazeer · Quoc V Le -
2021 Poster: Understanding the Effect of Stochasticity in Policy Optimization »
Jincheng Mei · Bo Dai · Chenjun Xiao · Csaba Szepesvari · Dale Schuurmans -
2021 Poster: Pay Attention to MLPs »
Hanxiao Liu · Zihang Dai · David So · Quoc V Le -
2021 Poster: Improving Calibration through the Relationship with Adversarial Robustness »
Yao Qin · Xuezhi Wang · Alex Beutel · Ed Chi -
2020 : Panel Discussion & Closing »
Yejin Choi · Alexei Efros · Chelsea Finn · Kristen Grauman · Quoc V Le · Yann LeCun · Ruslan Salakhutdinov · Eric Xing -
2020 : Invited Speaker: Ed Chi »
Ed Chi -
2020 Poster: Evolving Normalization-Activation Layers »
Hanxiao Liu · Andy Brock · Karen Simonyan · Quoc V Le -
2020 Spotlight: Evolving Normalization-Activation Layers »
Hanxiao Liu · Andy Brock · Karen Simonyan · Quoc V Le -
2020 Poster: PyGlove: Symbolic Programming for Automated Machine Learning »
Daiyi Peng · Xuanyi Dong · Esteban Real · Mingxing Tan · Yifeng Lu · Gabriel Bender · Hanxiao Liu · Adam Kraft · Chen Liang · Quoc V Le -
2020 Poster: RandAugment: Practical Automated Data Augmentation with a Reduced Search Space »
Ekin Dogus Cubuk · Barret Zoph · Jonathon Shlens · Quoc V Le -
2020 Oral: PyGlove: Symbolic Programming for Automated Machine Learning »
Daiyi Peng · Xuanyi Dong · Esteban Real · Mingxing Tan · Yifeng Lu · Gabriel Bender · Hanxiao Liu · Adam Kraft · Chen Liang · Quoc V Le -
2020 Poster: Fairness without Demographics through Adversarially Reweighted Learning »
Preethi Lahoti · Alex Beutel · Jilin Chen · Kang Lee · Flavien Prost · Nithum Thain · Xuezhi Wang · Ed Chi -
2020 Poster: Rethinking Pre-training and Self-training »
Barret Zoph · Golnaz Ghiasi · Tsung-Yi Lin · Yin Cui · Hanxiao Liu · Ekin Dogus Cubuk · Quoc V Le -
2020 Oral: Rethinking Pre-training and Self-training »
Barret Zoph · Golnaz Ghiasi · Tsung-Yi Lin · Yin Cui · Hanxiao Liu · Ekin Dogus Cubuk · Quoc V Le -
2020 Poster: Unsupervised Data Augmentation for Consistency Training »
Qizhe Xie · Zihang Dai · Eduard Hovy · Thang Luong · Quoc V Le -
2020 Poster: Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing »
Zihang Dai · Guokun Lai · Yiming Yang · Quoc V Le -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Oral: XLNet: Generalized Autoregressive Pretraining for Language Understanding »
Zhilin Yang · Zihang Dai · Yiming Yang · Jaime Carbonell · Russ Salakhutdinov · Quoc V Le -
2019 Poster: CondConv: Conditionally Parameterized Convolutions for Efficient Inference »
Brandon Yang · Gabriel Bender · Quoc V Le · Jiquan Ngiam -
2019 Poster: Maximum Entropy Monte-Carlo Planning »
Chenjun Xiao · Ruitong Huang · Jincheng Mei · Dale Schuurmans · Martin Müller -
2019 Poster: Mixtape: Breaking the Softmax Bottleneck Efficiently »
Zhilin Yang · Thang Luong · Russ Salakhutdinov · Quoc V Le -
2019 Poster: Surrogate Objectives for Batch Policy Optimization in One-step Decision Making »
Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans -
2019 Poster: Saccader: Improving Accuracy of Hard Attention Models for Vision »
Gamaleldin Elsayed · Simon Kornblith · Quoc V Le -
2019 Poster: GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism »
Yanping Huang · Youlong Cheng · Ankur Bapna · Orhan Firat · Dehao Chen · Mia Chen · HyoukJoong Lee · Jiquan Ngiam · Quoc V Le · Yonghui Wu · zhifeng Chen -
2019 Poster: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Spotlight: Invertible Convolutional Flow »
Mahdi Karami · Dale Schuurmans · Jascha Sohl-Dickstein · Laurent Dinh · Daniel Duckworth -
2019 Poster: High Fidelity Video Prediction with Large Stochastic Recurrent Neural Networks »
Ruben Villegas · Arkanath Pathak · Harini Kannan · Dumitru Erhan · Quoc V Le · Honglak Lee -
2018 : Poster Session (All Posters) »
Stephen Macke · Hongzi Mao · Caroline Lemieux · Saim Salman · Rishikesh Jha · Hanrui Wang · Shoumik P Palkar · Tianqi Chen · Thomas Pumir · Vaishnav Janardhan · adit bhardwaj · Ed Chi -
2018 : Off-policy Policy Optimization (Dale Schuurmans) »
Dale Schuurmans -
2018 Poster: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Spotlight: Memory Augmented Policy Optimization for Program Synthesis and Semantic Parsing »
Chen Liang · Mohammad Norouzi · Jonathan Berant · Quoc V Le · Ni Lao -
2018 Poster: DropBlock: A regularization method for convolutional networks »
Golnaz Ghiasi · Tsung-Yi Lin · Quoc V Le -
2017 : Ed Chi (Google) on Learned Deep Retrieval for Recommenders »
Ed Chi -
2017 Symposium: Metalearning »
Risto Miikkulainen · Quoc V Le · Kenneth Stanley · Chrisantha Fernando -
2017 Poster: Bridging the Gap Between Value and Policy Based Reinforcement Learning »
Ofir Nachum · Mohammad Norouzi · Kelvin Xu · Dale Schuurmans -
2017 Poster: Multi-view Matrix Factorization for Linear Dynamical System Estimation »
Mahdi Karami · Martha White · Dale Schuurmans · Csaba Szepesvari -
2016 Poster: Deep Learning Games »
Dale Schuurmans · Martin A Zinkevich -
2016 Poster: An Online Sequence-to-Sequence Model Using Partial Conditioning »
Navdeep Jaitly · Quoc V Le · Oriol Vinyals · Ilya Sutskever · David Sussillo · Samy Bengio -
2016 Poster: Reward Augmented Maximum Likelihood for Neural Structured Prediction »
Mohammad Norouzi · Samy Bengio · zhifeng Chen · Navdeep Jaitly · Mike Schuster · Yonghui Wu · Dale Schuurmans -
2015 Poster: Semi-supervised Sequence Learning »
Andrew Dai · Quoc V Le -
2015 Poster: Embedding Inference for Structured Multilabel Prediction »
Farzaneh Mirzazadeh · Siamak Ravanbakhsh · Nan Ding · Dale Schuurmans -
2014 Workshop: Representation and Learning Methods for Complex Outputs »
Richard Zemel · Dale Schuurmans · Kilian Q Weinberger · Yuhong Guo · Jia Deng · Francesco Dinuzzo · Hal Daumé III · Honglak Lee · Noah A Smith · Richard Sutton · Jiaqian YU · Vitaly Kuznetsov · Luke Vilnis · Hanchen Xiong · Calvin Murdock · Thomas Unterthiner · Jean-Francis Roy · Martin Renqiang Min · Hichem SAHBI · Fabio Massimo Zanzotto -
2014 Poster: Convex Deep Learning via Normalized Kernels »
Özlem Aslan · Xinhua Zhang · Dale Schuurmans -
2014 Poster: Sequence to Sequence Learning with Neural Networks »
Ilya Sutskever · Oriol Vinyals · Quoc V Le -
2014 Oral: Sequence to Sequence Learning with Neural Networks »
Ilya Sutskever · Oriol Vinyals · Quoc V Le -
2013 Workshop: Randomized Methods for Machine Learning »
David Lopez-Paz · Quoc V Le · Alexander Smola -
2013 Workshop: Output Representation Learning »
Yuhong Guo · Dale Schuurmans · Richard Zemel · Samy Bengio · Yoshua Bengio · Li Deng · Dan Roth · Kilian Q Weinberger · Jason Weston · Kihyuk Sohn · Florent Perronnin · Gabriel Synnaeve · Pablo R Strasser · julien audiffren · Carlo Ciliberto · Dan Goldwasser -
2013 Poster: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Spotlight: Convex Two-Layer Modeling »
Özlem Aslan · Hao Cheng · Xinhua Zhang · Dale Schuurmans -
2013 Poster: Polar Operators for Structured Sparse Estimation »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: Convex Multi-view Subspace Learning »
Martha White · Yao-Liang Yu · Xinhua Zhang · Dale Schuurmans -
2012 Poster: Accelerated Training for Matrix-norm Regularization: A Boosting Approach »
Xinhua Zhang · Yao-Liang Yu · Dale Schuurmans -
2012 Poster: A Polynomial-time Form of Robust Regression »
Yao-Liang Yu · Özlem Aslan · Dale Schuurmans -
2010 Poster: Relaxed Clipping: A Global Training Method for Robust Regression and Classification »
Yao-Liang Yu · Min Yang · Linli Xu · Martha White · Dale Schuurmans -
2009 Poster: Convex Relaxation of Mixture Regression with Efficient Algorithms »
Novi Quadrianto · Tiberio Caetano · John Lim · Dale Schuurmans -
2009 Poster: A General Projection Property for Distribution Families »
Yao-Liang Yu · Yuxi Li · Dale Schuurmans · Csaba Szepesvari -
2007 Spotlight: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Poster: Stable Dual Dynamic Programming »
Tao Wang · Daniel Lizotte · Michael Bowling · Dale Schuurmans -
2007 Session: Spotlights »
Dale Schuurmans -
2007 Poster: Convex Relaxations of EM »
Yuhong Guo · Dale Schuurmans -
2007 Poster: Discriminative Batch Mode Active Learning »
Yuhong Guo · Dale Schuurmans -
2006 Poster: Learning to Model Spatial Dependency: Semi-Supervised Discriminative Random Fields »
Chi-Hoon Lee · Shaojun Wang · Feng Jiao · Dale Schuurmans · Russell Greiner -
2006 Poster: implicit Online Learning with Kernels »
Li Cheng · Vishwanathan S V N · Dale Schuurmans · Shaojun Wang · Terry Caelli