Timezone: »
In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in a common form, we derive a novel algorithm, Retrace(lambda), with three desired properties: (1) it has low variance; (2) it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) it is efficient as it makes the best use of samples collected from near on-policy behaviour policies. We analyse the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. We believe this is the first return-based off-policy control algorithm converging a.s. to Q* without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q(lambda), which was an open problem since 1989. We illustrate the benefits of Retrace(lambda) on a standard suite of Atari 2600 games.
Author Information
Remi Munos (Google DeepMind)
Tom Stepleton (Google DeepMind)
Anna Harutyunyan (Vrije Universiteit Brussel)
Marc Bellemare (Google DeepMind)
More from the Same Authors
-
2022 : A Novel Stochastic Gradient Descent Algorithm for LearningPrincipal Subspaces »
Charline Le Lan · Joshua Greaves · Jesse Farebrother · Mark Rowland · Fabian Pedregosa · Rishabh Agarwal · Marc Bellemare -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Curiosity in Hindsight »
Daniel Jarrett · Corentin Tallec · Florent Altché · Thomas Mesnard · Remi Munos · Michal Valko -
2022 : Proto-Value Networks: Scaling Representation Learning with Auxiliary Tasks »
Jesse Farebrother · Joshua Greaves · Rishabh Agarwal · Charline Le Lan · Ross Goroshin · Pablo Samuel Castro · Marc Bellemare -
2022 : Variance Double-Down: The Small Batch Size Anomaly in Multistep Deep Reinforcement Learning »
Johan Obando Ceron · Marc Bellemare · Pablo Samuel Castro -
2022 : Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier »
Pierluca D'Oro · Max Schwarzer · Evgenii Nikishin · Pierre-Luc Bacon · Marc Bellemare · Aaron Courville -
2022 : Investigating Multi-task Pretraining and Generalization in Reinforcement Learning »
Adrien Ali Taiga · Rishabh Agarwal · Jesse Farebrother · Aaron Courville · Marc Bellemare -
2022 Spotlight: Lightning Talks 4A-4 »
Yunhao Tang · LING LIANG · Thomas Chau · Daeha Kim · Junbiao Cui · Rui Lu · Lei Song · Byung Cheol Song · Andrew Zhao · Remi Munos · Łukasz Dudziak · Jiye Liang · Ke Xue · Kaidi Xu · Mark Rowland · Hongkai Wen · Xing Hu · Xiaobin Huang · Simon Du · Nicholas Lane · Chao Qian · Lei Deng · Bernardo Avila Pires · Gao Huang · Will Dabney · Mohamed Abdelfattah · Yuan Xie · Marc Bellemare -
2022 Spotlight: Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Mark Rowland · Michal Valko · Pierre Ménard -
2022 Spotlight: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2022 Poster: BYOL-Explore: Exploration by Bootstrapped Prediction »
Zhaohan Guo · Shantanu Thakoor · Miruna Pislar · Bernardo Avila Pires · Florent Altché · Corentin Tallec · Alaa Saade · Daniele Calandriello · Jean-Bastien Grill · Yunhao Tang · Michal Valko · Remi Munos · Mohammad Gheshlaghi Azar · Bilal Piot -
2022 Poster: Reincarnating Reinforcement Learning: Reusing Prior Computation to Accelerate Progress »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2022 Poster: The Nature of Temporal Difference Errors in Multi-step Distributional Reinforcement Learning »
Yunhao Tang · Remi Munos · Mark Rowland · Bernardo Avila Pires · Will Dabney · Marc Bellemare -
2022 Poster: Optimistic Posterior Sampling for Reinforcement Learning with Few Samples and Tight Guarantees »
Daniil Tiapkin · Denis Belomestny · Daniele Calandriello · Eric Moulines · Remi Munos · Alexey Naumov · Mark Rowland · Michal Valko · Pierre Ménard -
2021 Workshop: Ecological Theory of Reinforcement Learning: How Does Task Design Influence Agent Learning? »
Manfred Díaz · Hiroki Furuta · Elise van der Pol · Lisa Lee · Shixiang (Shane) Gu · Pablo Samuel Castro · Simon Du · Marc Bellemare · Sergey Levine -
2021 Poster: Learning in two-player zero-sum partially observable Markov games with perfect recall »
Tadashi Kozuno · Pierre Ménard · Remi Munos · Michal Valko -
2021 Oral: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2021 Poster: Deep Reinforcement Learning at the Edge of the Statistical Precipice »
Rishabh Agarwal · Max Schwarzer · Pablo Samuel Castro · Aaron Courville · Marc Bellemare -
2021 Poster: Unifying Gradient Estimators for Meta-Reinforcement Learning via Off-Policy Evaluation »
Yunhao Tang · Tadashi Kozuno · Mark Rowland · Remi Munos · Michal Valko -
2020 : Contributed Talk #3: Contrastive Behavioral Similarity Embeddings for Generalization in Reinforcement Learning »
Rishabh Agarwal · Marlos C. Machado · Pablo Samuel Castro · Marc Bellemare -
2020 : Panel discussion »
Pierre-Yves Oudeyer · Marc Bellemare · Peter Stone · Matt Botvinick · Susan Murphy · Anusha Nagabandi · Ashley Edwards · Karen Liu · Pieter Abbeel -
2020 : Invited talk: Marc Bellemare "Autonomous navigation of stratospheric balloons using reinforcement learning" »
Marc Bellemare -
2020 Poster: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2020 Poster: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning »
Jean-Bastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Daniel (Zhaohan) Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko -
2020 Oral: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2020 Oral: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning »
Jean-Bastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Daniel (Zhaohan) Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Spotlight 2 »
Aaron Sidford · Mengdi Wang · Lin Yang · Yinyu Ye · Zuyue Fu · Zhuoran Yang · Yongxin Chen · Zhaoran Wang · Ofir Nachum · Bo Dai · Ilya Kostrikov · Dale Schuurmans · Ziyang Tang · Yihao Feng · Lihong Li · Denny Zhou · Qiang Liu · Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Simon Du · Sham Kakade · Ruosong Wang · Minshuo Chen · Tianyi Liu · Xingguo Li · Zhaoran Wang · Tuo Zhao · Philip Amortila · Doina Precup · Prakash Panangaden · Marc Bellemare -
2019 Poster: Planning in entropy-regularized Markov decision processes and games »
Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko -
2019 Poster: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Spotlight: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Poster: A Geometric Perspective on Optimal Representations for Reinforcement Learning »
Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2018 Poster: Optimistic optimization of a Brownian »
Jean-Bastien Grill · Michal Valko · Remi Munos -
2018 Poster: Actor-Critic Policy Optimization in Partially Observable Multiagent Environments »
Sriram Srinivasan · Marc Lanctot · Vinicius Zambaldi · Julien Perolat · Karl Tuyls · Remi Munos · Michael Bowling -
2017 : Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Spotlights & Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Best Paper Award and Talk — Learning with options that terminate off-policy (Anna Harutyunyan) »
Anna Harutyunyan -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2016 Workshop: The Future of Interactive Machine Learning »
Kory Mathewson @korymath · Kaushik Subramanian · Mark Ho · Robert Loftin · Joseph L Austerweil · Anna Harutyunyan · Doina Precup · Layla El Asri · Matthew Gombolay · Jerry Zhu · Sonia Chernova · Charles Isbell · Patrick M Pilarski · Weng-Keen Wong · Manuela Veloso · Julie A Shah · Matthew Taylor · Brenna Argall · Michael Littman -
2016 Poster: Unifying Count-Based Exploration and Intrinsic Motivation »
Marc Bellemare · Sriram Srinivasan · Georg Ostrovski · Tom Schaul · David Saxton · Remi Munos -
2016 Poster: Memory-Efficient Backpropagation Through Time »
Audrunas Gruslys · Remi Munos · Ivo Danihelka · Marc Lanctot · Alex Graves -
2016 Poster: Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning »
Jean-Bastien Grill · Michal Valko · Remi Munos -
2016 Oral: Blazing the trails before beating the path: Sample-efficient Monte-Carlo planning »
Jean-Bastien Grill · Michal Valko · Remi Munos -
2015 Poster: Black-box optimization of noisy functions with unknown smoothness »
Jean-Bastien Grill · Michal Valko · Remi Munos · Remi Munos