Timezone: »
Poster
Is the Bellman residual a bad proxy?
Matthieu Geist · Bilal Piot · Olivier Pietquin
This paper aims at theoretically and empirically comparing two standard optimization criteria for Reinforcement Learning: i) maximization of the mean value and ii) minimization of the Bellman residual. For that purpose, we place ourselves in the framework of policy search algorithms, that are usually designed to maximize the mean value, and derive a method that minimizes the residual $\|T_* v_\pi - v_\pi\|_{1,\nu}$ over policies. A theoretical analysis shows how good this proxy is to policy optimization, and notably that it is better than its value-based counterpart. We also propose experiments on randomly generated generic Markov decision processes, specifically designed for studying the influence of the involved concentrability coefficient. They show that the Bellman residual is generally a bad proxy to policy optimization and that directly maximizing the mean value is much better, despite the current lack of deep theoretical analysis. This might seem obvious, as directly addressing the problem of interest is usually better, but given the prevalence of (projected) Bellman residual minimization in value-based reinforcement learning, we believe that this question is worth to be considered.
Author Information
Matthieu Geist (Université de Lorraine)
Bilal Piot (DeepMind)
Olivier Pietquin (Google Research Brain Team)
More from the Same Authors
-
2021 : Continuous Control with Action Quantization from Demonstrations »
Robert Dadashi · Leonard Hussenot · Damien Vincent · Anton Raichuk · Matthieu Geist · Olivier Pietquin -
2021 : Implicitly Regularized RL with Implicit Q-values »
Nino Vieillard · Marcin Andrychowicz · Anton Raichuk · Olivier Pietquin · Matthieu Geist -
2022 : BLaDE: Robust Exploration via Diffusion Models »
Bilal Piot · Zhaohan Guo · Shantanu Thakoor · Mohammad Gheshlaghi Azar -
2022 Poster: BYOL-Explore: Exploration by Bootstrapped Prediction »
Zhaohan Guo · Shantanu Thakoor · Miruna Pislar · Bernardo Avila Pires · Florent Altché · Corentin Tallec · Alaa Saade · Daniele Calandriello · Jean-Bastien Grill · Yunhao Tang · Michal Valko · Remi Munos · Mohammad Gheshlaghi Azar · Bilal Piot -
2022 Poster: Emergent Communication: Generalization and Overfitting in Lewis Games »
Mathieu Rita · Corentin Tallec · Paul Michel · Jean-Bastien Grill · Olivier Pietquin · Emmanuel Dupoux · Florian Strub -
2021 Poster: Twice regularized MDPs and the equivalence between robustness and regularization »
Esther Derman · Matthieu Geist · Shie Mannor -
2021 Poster: There Is No Turning Back: A Self-Supervised Approach for Reversibility-Aware Reinforcement Learning »
Nathan Grinsztajn · Johan Ferret · Olivier Pietquin · philippe preux · Matthieu Geist -
2021 Poster: What Matters for Adversarial Imitation Learning? »
Manu Orsini · Anton Raichuk · Leonard Hussenot · Damien Vincent · Robert Dadashi · Sertan Girgin · Matthieu Geist · Olivier Bachem · Olivier Pietquin · Marcin Andrychowicz -
2020 Poster: Munchausen Reinforcement Learning »
Nino Vieillard · Olivier Pietquin · Matthieu Geist -
2020 Poster: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2020 Poster: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning »
Jean-Bastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Daniel (Zhaohan) Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko -
2020 Oral: Leverage the Average: an Analysis of KL Regularization in Reinforcement Learning »
Nino Vieillard · Tadashi Kozuno · Bruno Scherrer · Olivier Pietquin · Remi Munos · Matthieu Geist -
2020 Oral: Bootstrap Your Own Latent - A New Approach to Self-Supervised Learning »
Jean-Bastien Grill · Florian Strub · Florent Altché · Corentin Tallec · Pierre Richemond · Elena Buchatskaya · Carl Doersch · Bernardo Avila Pires · Daniel (Zhaohan) Guo · Mohammad Gheshlaghi Azar · Bilal Piot · koray kavukcuoglu · Remi Munos · Michal Valko -
2020 Poster: Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications »
Sarah Perrin · Julien Perolat · Mathieu Lauriere · Matthieu Geist · Romuald Elie · Olivier Pietquin -
2019 : Poster and Coffee Break 2 »
Karol Hausman · Kefan Dong · Ken Goldberg · Lihong Li · Lin Yang · Lingxiao Wang · Lior Shani · Liwei Wang · Loren Amdahl-Culleton · Lucas Cassano · Marc Dymetman · Marc Bellemare · Marcin Tomczak · Margarita Castro · Marius Kloft · Marius-Constantin Dinu · Markus Holzleitner · Martha White · Mengdi Wang · Michael Jordan · Mihailo Jovanovic · Ming Yu · Minshuo Chen · Moonkyung Ryu · Muhammad Zaheer · Naman Agarwal · Nan Jiang · Niao He · Nikolaus Yasui · Nikos Karampatziakis · Nino Vieillard · Ofir Nachum · Olivier Pietquin · Ozan Sener · Pan Xu · Parameswaran Kamalaruban · Paul Mineiro · Paul Rolland · Philip Amortila · Pierre-Luc Bacon · Prakash Panangaden · Qi Cai · Qiang Liu · Quanquan Gu · Raihan Seraj · Richard Sutton · Rick Valenzano · Robert Dadashi · Rodrigo Toro Icarte · Roshan Shariff · Roy Fox · Ruosong Wang · Saeed Ghadimi · Samuel Sokota · Sean Sinclair · Sepp Hochreiter · Sergey Levine · Sergio Valcarcel Macua · Sham Kakade · Shangtong Zhang · Sheila McIlraith · Shie Mannor · Shimon Whiteson · Shuai Li · Shuang Qiu · Wai Lok Li · Siddhartha Banerjee · Sitao Luan · Tamer Basar · Thinh Doan · Tianhe Yu · Tianyi Liu · Tom Zahavy · Toryn Klassen · Tuo Zhao · Vicenç Gómez · Vincent Liu · Volkan Cevher · Wesley Suttle · Xiao-Wen Chang · Xiaohan Wei · Xiaotong Liu · Xingguo Li · Xinyi Chen · Xingyou Song · Yao Liu · YiDing Jiang · Yihao Feng · Yilun Du · Yinlam Chow · Yinyu Ye · Yishay Mansour · · Yonathan Efroni · Yongxin Chen · Yuanhao Wang · Bo Dai · Chen-Yu Wei · Harsh Shrivastava · Hongyang Zhang · Qinqing Zheng · SIDDHARTHA SATPATHI · Xueqing Liu · Andreu Vall -
2019 : Poster Presentations »
Rahul Mehta · Andrew Lampinen · Binghong Chen · Sergio Pascual-Diaz · Jordi Grau-Moya · Aldo Faisal · Jonathan Tompson · Yiren Lu · Khimya Khetarpal · Martin Klissarov · Pierre-Luc Bacon · Doina Precup · Thanard Kurutach · Aviv Tamar · Pieter Abbeel · Jinke He · Maximilian Igl · Shimon Whiteson · Wendelin Boehmer · Raphaël Marinier · Olivier Pietquin · Karol Hausman · Sergey Levine · Chelsea Finn · Tianhe Yu · Lisa Lee · Benjamin Eysenbach · Emilio Parisotto · Eric Xing · Ruslan Salakhutdinov · Hongyu Ren · Anima Anandkumar · Deepak Pathak · Christopher Lu · Trevor Darrell · Alexei Efros · Phillip Isola · Feng Liu · Bo Han · Gang Niu · Masashi Sugiyama · Saurabh Kumar · Janith Petangoda · Johan Ferret · James McClelland · Kara Liu · Animesh Garg · Robert Lange -
2019 : Oral Presentations »
Janith Petangoda · Sergio Pascual-Diaz · Jordi Grau-Moya · Raphaël Marinier · Olivier Pietquin · Alexei Efros · Phillip Isola · Trevor Darrell · Christopher Lu · Deepak Pathak · Johan Ferret -
2019 Poster: Budgeted Reinforcement Learning in Continuous State Space »
Nicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin -
2019 Poster: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2019 Spotlight: Hindsight Credit Assignment »
Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos -
2018 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Erik Wijmans · Samyak Datta · Ethan Perez · Mateusz Malinowski · Stefan Lee · Peter Anderson · Aaron Courville · Jeremie MARY · Dhruv Batra · Devi Parikh · Olivier Pietquin · Chiori HORI · Tim Marks · Anoop Cherian -
2017 : Panel Discussion »
Felix Hill · Olivier Pietquin · Jack Gallant · Raymond Mooney · Sanja Fidler · Chen Yu · Devi Parikh -
2017 : Dialogue systems and RL: interconnecting language, vision and rewards »
Olivier Pietquin -
2017 Workshop: Visually grounded interaction and language »
Florian Strub · Harm de Vries · Abhishek Das · Satwik Kottur · Stefan Lee · Mateusz Malinowski · Olivier Pietquin · Devi Parikh · Dhruv Batra · Aaron Courville · Jeremie Mary -
2017 Poster: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2017 Spotlight: Modulating early visual processing by language »
Harm de Vries · Florian Strub · Jeremie Mary · Hugo Larochelle · Olivier Pietquin · Aaron Courville -
2017 Poster: Reconstruct & Crush Network »
Erinc Merdivan · Mohammad Reza Loghmani · Matthieu Geist -
2016 : Invited Talk: Olivier Pietquin »
Olivier Pietquin -
2015 : Machine Learning For Conversational Systems »
Larry Heck · Li Deng · Olivier Pietquin · Tomas Mikolov -
2014 Poster: Difference of Convex Functions Programming for Reinforcement Learning »
Bilal Piot · Matthieu Geist · Olivier Pietquin -
2014 Spotlight: Difference of Convex Functions Programming for Reinforcement Learning »
Bilal Piot · Matthieu Geist · Olivier Pietquin -
2012 Poster: Inverse Reinforcement Learning through Structured Classification »
Edouard Klein · Matthieu Geist · BILAL PIOT · Olivier Pietquin