Timezone: »
We propose an algorithm for meta-optimization that lets the meta-learner teach itself. The algorithm first bootstraps a target from the meta-learner, then optimises the meta-learner by minimising the distance to that target under some loss. Focusing on meta-learning with gradients, we establish conditions that guarantee performance improvements and show that the improvement is related to the target distance. Thus, by controlling curvature, the distance measure can be used to ease meta-optimization. Further, the bootstrapping mechanism can extend the effective meta-learning horizon without requiring backpropagation through all updates. The algorithm is versatile and easy to implement. We achieve a new state-of-the art for model-free agents on the Atari ALE benchmark, improve upon MAML in few-shot learning, and demonstrate how our approach opens up new possibilities by meta-learning efficient exploration in an epsilon-greedy Q-learning agent.
Author Information
Sebastian Flennerhag (DeepMind)
Ph.D. candidate in Deep Learning, focusing on network adaptation in transfer learning, meta learning and sequence learning.
Yannick Schroecker (Georgia Institute of Technology)
Tom Zahavy (Deepmind)
Hado van Hasselt (DeepMind)
David Silver (DeepMind)
Satinder Singh (DeepMind)
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Spotlight: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Spotlight: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2021 : GrASP: Gradient-Based Affordance Selection for Planning »
Vivek Veeriah · Zeyu Zheng · Richard L Lewis · Satinder Singh -
2021 : Introducing Symmetries to Black Box Meta Reinforcement Learning »
Louis Kirsch · Sebastian Flennerhag · Hado van Hasselt · Abram Friesen · Junhyuk Oh · Yutian Chen -
2021 : Introducing Symmetries to Black Box Meta Reinforcement Learning »
Louis Kirsch · Sebastian Flennerhag · Hado van Hasselt · Abram Friesen · Junhyuk Oh · Yutian Chen -
2022 : In-Context Policy Iteration »
Ethan Brooks · Logan Walls · Richard L Lewis · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 : Optimistic Meta-Gradients »
Sebastian Flennerhag · Tom Zahavy · Brendan O'Donoghue · Hado van Hasselt · András György · Satinder Singh -
2022 : In-context Reinforcement Learning with Algorithm Distillation »
Michael Laskin · Luyu Wang · Junhyuk Oh · Emilio Parisotto · Stephen Spencer · Richie Steigerwald · DJ Strouse · Steven Hansen · Angelos Filos · Ethan Brooks · Maxime Gazeau · Himanshu Sahni · Satinder Singh · Volodymyr Mnih -
2022 Poster: Palm up: Playing in the Latent Manifold for Unsupervised Pretraining »
Hao Liu · Tom Zahavy · Volodymyr Mnih · Satinder Singh -
2022 Poster: Approximate Value Equivalence »
Christopher Grimm · Andre Barreto · Satinder Singh -
2022 Poster: Planning to the Information Horizon of BAMDPs via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 : Reducing the Information Horizon of Bayes-Adaptive Markov Decision Processes via Epistemic State Abstraction »
Dilip Arumugam · Satinder Singh -
2021 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · David Silver · Matthew Taylor · Martha White · Srijita Das · Yuqing Du · Andrew Patterson · Manan Tomar · Olivia Watkins -
2021 Poster: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2021 Poster: Reward is enough for convex MDPs »
Tom Zahavy · Brendan O'Donoghue · Guillaume Desjardins · Satinder Singh -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Discovery of Options via Meta-Learned Subgoals »
Vivek Veeriah · Tom Zahavy · Matteo Hessel · Zhongwen Xu · Junhyuk Oh · Iurii Kemaev · Hado van Hasselt · David Silver · Satinder Singh -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2021 Poster: Self-Consistent Models and Values »
Greg Farquhar · Kate Baumli · Zita Marinho · Angelos Filos · Matteo Hessel · Hado van Hasselt · David Silver -
2021 Poster: Online and Offline Reinforcement Learning by Planning with a Learned Model »
Julian Schrittwieser · Thomas Hubert · Amol Mandhane · Mohammadamin Barekatain · Ioannis Antonoglou · David Silver -
2021 Oral: On the Expressivity of Markov Reward »
David Abel · Will Dabney · Anna Harutyunyan · Mark Ho · Michael Littman · Doina Precup · Satinder Singh -
2020 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Coline Devin · Misha Laskin · Kimin Lee · Janarthanan Rajendran · Vivek Veeriah -
2020 Poster: Discovering Reinforcement Learning Algorithms »
Junhyuk Oh · Matteo Hessel · Wojciech Czarnecki · Zhongwen Xu · Hado van Hasselt · Satinder Singh · David Silver -
2020 Poster: Meta-Gradient Reinforcement Learning with an Objective Discovered Online »
Zhongwen Xu · Hado van Hasselt · Matteo Hessel · Junhyuk Oh · Satinder Singh · David Silver -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Poster: A Self-Tuning Actor-Critic Algorithm »
Tom Zahavy · Zhongwen Xu · Vivek Veeriah · Matteo Hessel · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Poster: Forethought and Hindsight in Credit Assignment »
Veronica Chelu · Doina Precup · Hado van Hasselt -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Late-Breaking Papers (Talks) »
David Silver · Simon Du · Matthias Plappert -
2019 : Poster Spotlight 1 »
David Brandfonbrener · Joan Bruna · Tom Zahavy · Haim Kaplan · Yishay Mansour · Nikos Karampatziakis · John Langford · Paul Mineiro · Donghwan Lee · Niao He -
2019 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · Chelsea Finn · Joelle Pineau · David Silver · Satinder Singh · Joshua Achiam · Carlos Florensa · Christopher Grimm · Haoran Tang · Vivek Veeriah -
2019 : Coffee/Poster session 1 »
Shiro Takagi · Khurram Javed · Johanna Sommer · Amr Sharaf · Pierluca D'Oro · Ying Wei · Sivan Doveh · Colin White · Santiago Gonzalez · Cuong Nguyen · Mao Li · Tianhe Yu · Tiago Ramalho · Masahiro Nomura · Ahsan Alvi · Jean-Francois Ton · W. Ronny Huang · Jessica Lee · Sebastian Flennerhag · Michael Zhang · Abram Friesen · Paul Blomstedt · Alina Dubatovka · Sergey Bartunov · Subin Yi · Iaroslav Shcherbatyi · Christian Simon · Zeyuan Shang · David MacLeod · Lu Liu · Liam Fowl · Diego Mesquita · Deirdre Quillen -
2019 Poster: Discovery of Useful Questions as Auxiliary Tasks »
Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh -
2019 Poster: When to use parametric models in reinforcement learning? »
Hado van Hasselt · Matteo Hessel · John Aslanides -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2018 : Transferring Knowledge across Learning Processes »
Sebastian Flennerhag -
2018 : David Silver »
David Silver -
2018 Workshop: Wordplay: Reinforcement and Language Learning in Text-based Games »
Adam Trischler · Angeliki Lazaridou · Yonatan Bisk · Wendy Tay · Nate Kushman · Marc-Alexandre Côté · Alessandro Sordoni · Daniel Ricks · Tom Zahavy · Hal Daumé III -
2018 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · David Silver · Satinder Singh · Joelle Pineau · Joshua Achiam · Rein Houthooft · Aravind Srinivas -
2018 Poster: Meta-Gradient Reinforcement Learning »
Zhongwen Xu · Hado van Hasselt · David Silver -
2018 Poster: Learn What Not to Learn: Action Elimination with Deep Reinforcement Learning »
Tom Zahavy · Matan Haroush · Nadav Merlis · Daniel J Mankowitz · Shie Mannor -
2018 Poster: Breaking the Activation Function Bottleneck through Adaptive Parameterization »
Sebastian Flennerhag · Hujun Yin · John Keane · Mark Elliot -
2017 : Panel Discussion »
Matt Botvinick · Emma Brunskill · Marcos Campos · Jan Peters · Doina Precup · David Silver · Josh Tenenbaum · Roy Fox -
2017 : Deep Reinforcement Learning with Subgoals (David Silver) »
David Silver -
2017 : Poster session 2 and coffee break »
Sean McGregor · Tobias Hagge · Markus Stoye · Trang Thi Minh Pham · Seungkyun Hong · Amir Farbin · Sungyong Seo · Susana Zoghbi · Daniel George · Stanislav Fort · Steven Farrell · Arthur Pajot · Kyle Pearson · Adam McCarthy · Cecile Germain · Dustin Anderson · Mario Lezcano Casado · Mayur Mudigonda · Benjamin Nachman · Luke de Oliveira · Li Jing · Lingge Li · Soo Kyung Kim · Timothy Gebhard · Tom Zahavy -
2017 : Poster session 1 and coffee break »
Tobias Hagge · Sean McGregor · Markus Stoye · Trang Thi Minh Pham · Seungkyun Hong · Amir Farbin · Sungyong Seo · Susana Zoghbi · Daniel George · Stanislav Fort · Steven Farrell · Arthur Pajot · Kyle Pearson · Adam McCarthy · Cecile Germain · Dustin Anderson · Mario Lezcano Casado · Mayur Mudigonda · Benjamin Nachman · Luke de Oliveira · Li Jing · Lingge Li · Soo Kyung Kim · Timothy Gebhard · Tom Zahavy -
2017 Symposium: Deep Reinforcement Learning »
Pieter Abbeel · Yan Duan · David Silver · Satinder Singh · Junhyuk Oh · Rein Houthooft -
2017 Poster: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Poster: State Aware Imitation Learning »
Yannick Schroecker · Charles Isbell -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2017 Poster: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Oral: Imagination-Augmented Agents for Deep Reinforcement Learning »
Sébastien Racanière · Theophane Weber · David Reichert · Lars Buesing · Arthur Guez · Danilo Jimenez Rezende · Adrià Puigdomènech Badia · Oriol Vinyals · Nicolas Heess · Yujia Li · Razvan Pascanu · Peter Battaglia · Demis Hassabis · David Silver · Daan Wierstra -
2017 Poster: Shallow Updates for Deep Reinforcement Learning »
Nir Levine · Tom Zahavy · Daniel J Mankowitz · Aviv Tamar · Shie Mannor -
2016 Poster: Learning values across many orders of magnitude »
Hado van Hasselt · Arthur Guez · Arthur Guez · Matteo Hessel · Volodymyr Mnih · David Silver -
2015 Workshop: Deep Reinforcement Learning »
Pieter Abbeel · John Schulman · Satinder Singh · David Silver -
2015 Poster: Learning Continuous Control Policies by Stochastic Value Gradients »
Nicolas Heess · Gregory Wayne · David Silver · Timothy Lillicrap · Tom Erez · Yuval Tassa -
2014 Poster: Weighted importance sampling for off-policy learning with linear function approximation »
Rupam Mahmood · Hado P van Hasselt · Richard Sutton -
2010 Poster: Double Q-learning »
Hado P van Hasselt