Timezone: »
Memory-based meta-learning is a powerful technique to build agents that adapt fast to any task within a target distribution. A previous theoretical study has argued that this remarkable performance is because the meta-training protocol incentivises agents to behave Bayes-optimally. We empirically investigate this claim on a number of prediction and bandit tasks. Inspired by ideas from theoretical computer science, we show that meta-learned and Bayes-optimal agents not only behave alike, but they even share a similar computational structure, in the sense that one agent system can approximately simulate the other. Furthermore, we show that Bayes-optimal agents are fixed points of the meta-learning dynamics. Our results suggest that memory-based meta-learning is a general technique for numerically approximating Bayes-optimal agents; that is, even for task distributions for which we currently don't possess tractable models.
Author Information
Vladimir Mikulik (Google DeepMind)
Grégoire Delétang (DeepMind)
Tom McGrath (Deepmind)
Tim Genewein (DeepMind)
Miljan Martic (DeepMind)
Shane Legg (DeepMind)
Pedro Ortega (DeepMind)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Poster: Meta-trained agents implement Bayes-optimal agents »
Wed. Dec 9th 05:00 -- 07:00 PM Room Poster Session 3 #963
More from the Same Authors
-
2021 : Artificial what? »
Shane Legg -
2020 Poster: Avoiding Side Effects By Considering Future Tasks »
Victoria Krakovna · Laurent Orseau · Richard Ngo · Miljan Martic · Shane Legg -
2018 : Panel disucssion »
Max Welling · Tim Genewein · Edwin Park · Song Han -
2018 : TBC 12 »
Tim Genewein -
2018 : Neural network compression in the wild: why aiming for high compression factors is not enough »
Tim Genewein -
2018 Poster: Reward learning from human preferences and demonstrations in Atari »
Borja Ibarz · Jan Leike · Tobias Pohlen · Geoffrey Irving · Shane Legg · Dario Amodei -
2017 Poster: Deep Reinforcement Learning from Human Preferences »
Paul Christiano · Jan Leike · Tom Brown · Miljan Martic · Shane Legg · Dario Amodei -
2016 : Agency and Causality in Decision Making »
Pedro Ortega -
2016 Poster: Human Decision-Making under Limited Time »
Pedro Ortega · Alan A Stocker -
2014 Workshop: Novel Trends and Applications in Reinforcement Learning »
Csaba Szepesvari · Marc Deisenroth · Sergey Levine · Pedro Ortega · Brian Ziebart · Emma Brunskill · Naftali Tishby · Gerhard Neumann · Daniel Lee · Sridhar Mahadevan · Pieter Abbeel · David Silver · Vicenç Gómez -
2013 Workshop: Planning with Information Constraints for Control, Reinforcement Learning, Computational Neuroscience, Robotics and Games. »
Hilbert J Kappen · Naftali Tishby · Jan Peters · Evangelos Theodorou · David H Wolpert · Pedro Ortega -
2012 Poster: A Nonparametric Conjugate Prior Distribution for the Maximizing Argument of a Noisy Function »
Pedro Ortega · Tim Genewein · Jordi Grau-Moya · David Balduzzi · Daniel A Braun -
2007 Poster: Temporal Difference with Eligibility Traces Derived from First Principles »
Marcus Hutter · Shane Legg