Timezone: »
Recent work has shown that prompting language models to generate reasoning steps improves performance on many reasoning tasks. When moving beyond prompting, this raises the question of how we should supervise the finetuning of such models: outcome-based approaches which supervise the final result, or process-based approaches which supervise the reasoning process itself? Differences between these approaches might naturally be expected not just in final-answer errors but also in reasoning errors, which can be difficult to detect and are problematic in many real-world domains such as education. We run the first comprehensive comparison between process- and outcome-based approaches trained on a natural language task, GSM8K. We find that pure outcome-based supervision produces similar final-answer error rates with less label supervision. However, for correct reasoning steps we find it necessary to use process-based supervision or supervision from learned reward models that emulate process-based feedback. In total, we improve the previous best results from 16.8% to 12.7% final-answer error and from 14.0% to 3.4% reasoning error among final-answer-correct solutions.
Author Information
Jonathan Uesato (DeepMind)
Nate Kushman (DeepMind)
Ramana Kumar (DeepMind)
H. Francis Song (DeepMind)
Noah Siegel (DeepMind)
Lisa Wang (Google)
Antonia Creswell (Imperial College London)
Antonia Creswell is a Senior Research Scientist at DeepMind in the Cognition team. Her work focuses on the learning and integration of object representations in dynamic models. She completed her PhD on representation learning at Imperial College London in the department of Bioengineering.
Geoffrey Irving (Google)
Irina Higgins (DeepMind)
More from the Same Authors
-
2021 : Which priors matter? Benchmarking models for learning latent dynamics »
Aleksandar Botev · Andrew Jaegle · Peter Wirnsberger · Daniel Hennes · Irina Higgins -
2022 : Solving Math Word Problems with Process-based and Outcome-based Feedback »
Jonathan Uesato · Nate Kushman · Ramana Kumar · H. Francis Song · Noah Siegel · Lisa Wang · Antonia Creswell · Geoffrey Irving · Irina Higgins -
2022 : Panel Discussion I: Geometric and topological principles for representation learning in ML »
Irina Higgins · Taco Cohen · Erik Bekkers · Nina Miolane · Rose Yu -
2022 : Symmetry-Based Representations for Artificial and Biological Intelligence »
Irina Higgins -
2022 Workshop: Information-Theoretic Principles in Cognitive Systems »
Noga Zaslavsky · Mycal Tucker · Sarah Marzen · Irina Higgins · Stephanie Palmer · Samuel J Gershman -
2021 : Invited Talk #3 - Disentanglement for Controllable Image Generation (Irina Higgins) »
Irina Higgins -
2021 Poster: SyMetric: Measuring the Quality of Learnt Hamiltonian Dynamics Inferred from Vision »
Irina Higgins · Peter Wirnsberger · Andrew Jaegle · Aleksandar Botev -
2021 Poster: Unsupervised Object-Based Transition Models For 3D Partially Observable Environments »
Antonia Creswell · Rishabh Kabra · Chris Burgess · Murray Shanahan -
2021 Poster: SIMONe: View-Invariant, Temporally-Abstracted Object Representations via Unsupervised Video Decomposition »
Rishabh Kabra · Daniel Zoran · Goker Erdogan · Loic Matthey · Antonia Creswell · Matt Botvinick · Alexander Lerchner · Chris Burgess -
2021 : Balancing Structure In NeuroSymbolic Methods »
Antonia Creswell -
2021 Tutorial: Pay Attention to What You Need: Do Structural Priors Still Matter in the Age of Billion Parameter Models? »
Irina Higgins · Antonia Creswell · Sébastien Racanière -
2021 : Why do we Need Structure and Where does it Come From? »
Irina Higgins -
2020 : Invited Talk: Irina Higgins »
Irina Higgins -
2020 : Panel Discussion »
Jessica Hamrick · Klaus Greff · Michelle A. Lee · Irina Higgins · Josh Tenenbaum -
2020 Workshop: Object Representations for Learning and Reasoning »
William Agnew · Rim Assouel · Michael Chang · Antonia Creswell · Eliza Kosoy · Aravind Rajeswaran · Sjoerd van Steenkiste -
2020 Poster: Critic Regularized Regression »
Ziyu Wang · Alexander Novikov · Konrad Zolna · Josh Merel · Jost Tobias Springenberg · Scott Reed · Bobak Shahriari · Noah Siegel · Caglar Gulcehre · Nicolas Heess · Nando de Freitas -
2020 Poster: Disentangling by Subspace Diffusion »
David Pfau · Irina Higgins · Alex Botev · Sébastien Racanière -
2020 Poster: Enabling certification of verification-agnostic networks via memory-efficient semidefinite programming »
Sumanth Dathathri · Krishnamurthy Dvijotham · Alexey Kurakin · Aditi Raghunathan · Jonathan Uesato · Rudy Bunel · Shreya Shankar · Jacob Steinhardt · Ian Goodfellow · Percy Liang · Pushmeet Kohli -
2019 : Panel Discussion: What sorts of cognitive or biological (architectural) inductive biases will be crucial for developing effective artificial intelligence? »
Irina Higgins · Talia Konkle · Matthias Bethge · Nikolaus Kriegeskorte -
2019 : What is disentangling and does intelligence need it? »
Irina Higgins -
2019 Poster: Are Labels Required for Improving Adversarial Robustness? »
Jean-Baptiste Alayrac · Jonathan Uesato · Po-Sen Huang · Alhussein Fawzi · Robert Stanforth · Pushmeet Kohli -
2018 : Invited Talk 3 »
Irina Higgins -
2018 Workshop: Wordplay: Reinforcement and Language Learning in Text-based Games »
Adam Trischler · Angeliki Lazaridou · Yonatan Bisk · Wendy Tay · Nate Kushman · Marc-Alexandre Côté · Alessandro Sordoni · Daniel Ricks · Tom Zahavy · Hal Daumé III -
2018 Poster: Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies »
Alessandro Achille · Tom Eccles · Loic Matthey · Chris Burgess · Nicholas Watters · Alexander Lerchner · Irina Higgins -
2018 Spotlight: Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies »
Alessandro Achille · Tom Eccles · Loic Matthey · Chris Burgess · Nicholas Watters · Alexander Lerchner · Irina Higgins -
2018 Poster: Constructing Unrestricted Adversarial Examples with Generative Models »
Yang Song · Rui Shu · Nate Kushman · Stefano Ermon -
2017 : Irina Higgins »
Irina Higgins -
2016 Poster: DeepMath - Deep Sequence Models for Premise Selection »
Geoffrey Irving · Christian Szegedy · Alexander Alemi · Niklas Een · Francois Chollet · Josef Urban