Timezone: »
Deep reinforcement learning (RL) is notoriously impractical to deploy due to sample inefficiency. Meta-RL directly addresses this sample inefficiency by learning to perform few-shot learning when a distribution of related tasks is available for meta-training. While many specialized meta-RL methods have been proposed, recent work suggests that end-to-end learning in conjunction with an off-the-shelf sequential model, such as a recurrent network, is a surprisingly strong baseline. However, such claims have been controversial due to limited supporting evidence, particularly in the face of prior work establishing precisely the opposite. In this paper, we conduct an empirical investigation. While we likewise find that a recurrent network can achieve strong performance, we demonstrate that the use of hypernetworks is crucial to maximizing their potential. Surprisingly, when combined with hypernetworks, the recurrent baselines that are far simpler than existing specialized methods actually achieve the strongest performance of all methods evaluated.
Author Information
Jacob Beck (University of Oxford)
Risto Vuorio (University of Oxford)
I'm a PhD student in WhiRL at University of Oxford. I'm interested in reinforcement learning and meta-learning.
Zheng Xiong (University of Oxford)
Shimon Whiteson (Oxford University)
More from the Same Authors
-
2021 : No DICE: An Investigation of the Bias-Variance Tradeoff in Meta-Gradients »
Risto Vuorio · Jacob Beck · Greg Farquhar · Jakob Foerster · Shimon Whiteson -
2021 : On the Practical Consistency of Meta-Reinforcement Learning Algorithms »
Zheng Xiong · Luisa Zintgraf · Jacob Beck · Risto Vuorio · Shimon Whiteson -
2022 : Deconfounded Imitation Learning »
Risto Vuorio · Pim de Haan · Johann Brehmer · Hanno Ackermann · Daniel Dijkman · Taco Cohen -
2023 : Policy-Guided Diffusion »
Matthew T Jackson · Michael Matthews · Cong Lu · Jakob Foerster · Shimon Whiteson -
2023 : Discovering Temporally-Aware Reinforcement Learning Algorithms »
Matthew T Jackson · Chris Lu · Louis Kirsch · Robert Lange · Shimon Whiteson · Jakob Foerster -
2023 : JaxMARL: Multi-Agent RL Environments in JAX »
Alexander Rutherford · Benjamin Ellis · Matteo Gallici · Jonathan Cook · Andrei Lupu · Garðar Ingvarsson · Timon Willi · Akbir Khan · Christian Schroeder de Witt · Alexandra Souly · Saptarashmi Bandyopadhyay · Mikayel Samvelyan · Minqi Jiang · Robert Lange · Shimon Whiteson · Bruno Lacerda · Nick Hawes · Tim Rocktäschel · Chris Lu · Jakob Foerster -
2023 Poster: SMACv2: An Improved Benchmark for Cooperative Multi-Agent Reinforcement Learning »
Benjamin Ellis · Jonathan Cook · Skander Moalla · Mikayel Samvelyan · Mingfei Sun · Anuj Mahajan · Jakob Foerster · Shimon Whiteson -
2023 Poster: The Waymo Open Sim Agents Challenge »
Nico Montali · John Lambert · Paul Mougin · Alex Kuefler · Nicholas Rhinehart · Michelle Li · Cole Gulino · Tristan Emrich · Zoey Yang · Shimon Whiteson · Brandyn White · Dragomir Anguelov -
2023 Poster: Discovering General Reinforcement Learning Algorithms with Adversarial Environment Design »
Matthew T Jackson · Minqi Jiang · Jack Parker-Holder · Risto Vuorio · Chris Lu · Greg Farquhar · Shimon Whiteson · Jakob Foerster -
2022 Workshop: Deep Reinforcement Learning Workshop »
Karol Hausman · Qi Zhang · Matthew Taylor · Martha White · Suraj Nair · Manan Tomar · Risto Vuorio · Ted Xiao · Zeyu Zheng · Manan Tomar -
2022 Poster: In Defense of the Unitary Scalarization for Deep Multi-Task Learning »
Vitaly Kurin · Alessandro De Palma · Ilya Kostrikov · Shimon Whiteson · Pawan K Mudigonda -
2022 Poster: Equivariant Networks for Zero-Shot Coordination »
Darius Muglich · Christian Schroeder de Witt · Elise van der Pol · Shimon Whiteson · Jakob Foerster -
2021 Poster: Learning State Representations from Random Deep Action-conditional Predictions »
Zeyu Zheng · Vivek Veeriah · Risto Vuorio · Richard L Lewis · Satinder Singh -
2019 : Bayes-Adaptive Deep Reinforcement Learning via Meta-Learning - Invited Talk »
Shimon Whiteson -
2019 Poster: Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation »
Risto Vuorio · Shao-Hua Sun · Hexiang Hu · Joseph Lim -
2019 Spotlight: Multimodal Model-Agnostic Meta-Learning via Task-Aware Modulation »
Risto Vuorio · Shao-Hua Sun · Hexiang Hu · Joseph Lim -
2018 : Toward Multimodal Model-Agnostic Meta-Learning »
Risto Vuorio -
2017 Poster: Dynamic-Depth Context Tree Weighting »
Joao V Messias · Shimon Whiteson