Timezone: »
Bootstrapped value estimation has become a widely adopted ingredient for modern reinforcement learning algorithms. These methods compute a target value based on observed data and predictions for future values. The approximation error of the target value, which comes from stochastic dynamics and inaccurate predictions, can significantly affect the data efficiency of RL algorithms. Multi-step methods, such as n-step Q learning and TD(lambda), leverage the chain structure of the data, alleviating the effect of inaccurate predictions and allowing credit assignment across a longer time horizon. However, the main limitation of such multi-step methods is that they fail to exploit the graph structure of certain MDPs by only treating each trajectory independently, resulting in an inadequate estimate of the target value that misses the intersections between multiple trajectories. In this paper, we propose to treat the transition data of an MDP as a graph, and define a novel backup operator exploiting this graph structure. Comparing to multi-step backup, our graph backup method allows counterfactual credit assignment, and can reduce the variance that comes from stochastic environment dynamics. Our empirical evaluation on MiniGrid and Minatar shows graph backup can greatly improve data efficiency compared to one-step and multi-step backup.
Author Information
zhengyao Jiang (University College London)
Tianjun Zhang (University of California, Berkeley)
Robert Kirk (University College London)
I’m Robert Kirk, a PhD Student at UCL DARK Lab in the UCL Centre for Artificial Intelligence supervised by Tim Rocktäschel and Ed Grefenstette. I’m an aspiring effective altruist and rationalist. I’m interested in reinforcement learning, meta learning, natural language processing, interpretability and deep learning (and all the combinations thereof).
Tim Rocktäschel (Facebook AI Research)
Edward Grefenstette (Facebook AI Research & University College London)
More from the Same Authors
-
2021 : MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research »
Mikayel Samvelyan · Robert Kirk · Vitaly Kurin · Jack Parker-Holder · Minqi Jiang · Eric Hambro · Fabio Petroni · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel -
2021 : Grounding Aleatoric Uncertainty in Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Kuttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2021 : C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks »
Tianjun Zhang · Ben Eysenbach · Russ Salakhutdinov · Sergey Levine · Joseph Gonzalez -
2021 : That Escalated Quickly: Compounding Complexity by Editing Levels at the Frontier of Agent Capabilities »
Jack Parker-Holder · Minqi Jiang · Michael Dennis · Mikayel Samvelyan · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 : Return Dispersion as an Estimator of Learning Potential for Prioritized Level Replay »
Iryna Korshunova · Minqi Jiang · Jack Parker-Holder · Tim Rocktäschel · Edward Grefenstette -
2022 : Domain Generalization for Robust Model-Based Offline Reinforcement Learning »
Alan Clark · Shoaib Siddiqui · Robert Kirk · Usman Anwar · Stephen Chung · David Krueger -
2022 : Domain Generalization for Robust Model-Based Offline RL »
Alan Clark · Shoaib Siddiqui · Robert Kirk · Usman Anwar · Stephen Chung · David Krueger -
2022 : Efficient Planning in a Compact Latent Action Space »
zhengyao Jiang · Tianjun Zhang · Michael Janner · Yueying (Lisa) Li · Tim Rocktäschel · Edward Grefenstette · Yuandong Tian -
2022 : Optimal Transport for Offline Imitation Learning »
Yicheng Luo · zhengyao Jiang · Samuel Cohen · Edward Grefenstette · Marc Deisenroth -
2022 Workshop: LaReL: Language and Reinforcement Learning »
Laetitia Teodorescu · Laura Ruis · Tristan Karch · Cédric Colas · Paul Barde · Jelena Luketina · Athul Jacob · Pratyusha Sharma · Edward Grefenstette · Jacob Andreas · Marc-Alexandre Côté -
2022 Poster: Learning General World Models in a Handful of Reward-Free Deployments »
Yingchen Xu · Jack Parker-Holder · Aldo Pacchiano · Philip Ball · Oleh Rybkin · S Roberts · Tim Rocktäschel · Edward Grefenstette -
2022 Poster: Grounding Aleatoric Uncertainty for Unsupervised Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Andrei Lupu · Heinrich Küttler · Edward Grefenstette · Tim Rocktäschel · Jakob Foerster -
2022 Poster: Improving Policy Learning via Language Dynamics Distillation »
Victor Zhong · Jesse Mu · Luke Zettlemoyer · Edward Grefenstette · Tim Rocktäschel -
2022 Poster: Contrastive Learning as Goal-Conditioned Reinforcement Learning »
Benjamin Eysenbach · Tianjun Zhang · Sergey Levine · Russ Salakhutdinov -
2022 Poster: Improving Intrinsic Exploration with Language Abstractions »
Jesse Mu · Victor Zhong · Roberta Raileanu · Minqi Jiang · Noah Goodman · Tim Rocktäschel · Edward Grefenstette -
2021 : NeurIPS RL Competitions Results Presentations »
Rohin Shah · Liam Paull · Tabitha Lee · Tim Rocktäschel · Heinrich Küttler · Sharada Mohanty · Manuel Wuethrich -
2021 : The NetHack Challenge + Q&A »
Eric Hambro · Sharada Mohanty · Dipam Chakrabroty · Edward Grefenstette · Minqi Jiang · Robert Kirk · Vitaly Kurin · Heinrich Kuttler · Vegard Mella · Nantas Nardelli · Jack Parker-Holder · Roberta Raileanu · Tim Rocktäschel · Danielle Rothermel · Mikayel Samvelyan -
2021 Poster: Replay-Guided Adversarial Environment Design »
Minqi Jiang · Michael Dennis · Jack Parker-Holder · Jakob Foerster · Edward Grefenstette · Tim Rocktäschel -
2021 Poster: NovelD: A Simple yet Effective Exploration Criterion »
Tianjun Zhang · Huazhe Xu · Xiaolong Wang · Yi Wu · Kurt Keutzer · Joseph Gonzalez · Yuandong Tian -
2021 Poster: MADE: Exploration via Maximizing Deviation from Explored Regions »
Tianjun Zhang · Paria Rashidinejad · Jiantao Jiao · Yuandong Tian · Joseph Gonzalez · Stuart Russell -
2021 Poster: Learning Space Partitions for Path Planning »
Kevin Yang · Tianjun Zhang · Chris Cummins · Brandon Cui · Benoit Steiner · Linnan Wang · Joseph Gonzalez · Dan Klein · Yuandong Tian -
2020 : Q&A #1 »
Oren Etzioni · Tim Rocktäschel · Victoria Lin -
2020 : Invited Talk #3 »
Tim Rocktäschel -
2020 Poster: The NetHack Learning Environment »
Heinrich Küttler · Nantas Nardelli · Alexander Miller · Roberta Raileanu · Marco Selvatici · Edward Grefenstette · Tim Rocktäschel -
2020 Poster: Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks »
Patrick Lewis · Ethan Perez · Aleksandra Piktus · Fabio Petroni · Vladimir Karpukhin · Naman Goyal · Heinrich Küttler · Mike Lewis · Wen-tau Yih · Tim Rocktäschel · Sebastian Riedel · Douwe Kiela -
2019 Poster: ANODEV2: A Coupled Neural ODE Framework »
Tianjun Zhang · Zhewei Yao · Amir Gholami · Joseph Gonzalez · Kurt Keutzer · Michael Mahoney · George Biros -
2015 Poster: Teaching Machines to Read and Comprehend »
Karl Moritz Hermann · Tomas Kocisky · Edward Grefenstette · Lasse Espeholt · Will Kay · Mustafa Suleyman · Phil Blunsom -
2015 Poster: Learning to Transduce with Unbounded Memory »
Edward Grefenstette · Karl Moritz Hermann · Mustafa Suleyman · Phil Blunsom