Timezone: »
The principle of ``optimism in the face of uncertainty'' underpins many theoretically successful reinforcement learning algorithms. In this paper we provide a general framework for designing, analyzing and implementing such algorithms in the episodic reinforcement learning problem. This framework is built upon Lagrangian duality, and demonstrates that every model-optimistic algorithm that constructs an optimistic MDP has an equivalent representation as a value-optimistic dynamic programming algorithm. Typically, it was thought that these two classes of algorithms were distinct, with model-optimistic algorithms benefiting from a cleaner probabilistic analysis while value-optimistic algorithms are easier to implement and thus more practical. With the framework developed in this paper, we show that it is possible to get the best of both worlds by providing a class of algorithms which have a computationally efficient dynamic-programming implementation and also a simple probabilistic analysis. Besides being able to capture many existing algorithms in the tabular setting, our framework can also address large-scale problems under realizable function approximation, where it enables a simple model-based analysis of some recently proposed methods.
Author Information
Gergely Neu (Universitat Pompeu Fabra)
Ciara Pike-Burke (Imperial College London)
More from the Same Authors
-
2023 Poster: First- and Second-Order Bounds for Adversarial Linear Contextual Bandits »
Iuliia Olkhovskaia · Jack Mayo · Tim van Erven · Gergely Neu · Chen-Yu Wei -
2023 Poster: Sample Complexity of Goal-Conditioned Hierarchical Reinforcement Learning »
Arnaud Robert · Ciara Pike-Burke · Aldo Faisal -
2023 Poster: Optimal Convergence Rate for Exact Policy Mirror Descent in Discounted Markov Decision Processes »
Emmeran Johnson · Ciara Pike-Burke · Patrick Rebeschini -
2022 Poster: Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits »
Gergely Neu · Iuliia Olkhovskaia · Matteo Papini · Ludovic Schwartz -
2022 Poster: Proximal Point Imitation Learning »
Luca Viano · Angeliki Kamoutsi · Gergely Neu · Igor Krawczuk · Volkan Cevher -
2021 Poster: Local Differential Privacy for Regret Minimization in Reinforcement Learning »
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta -
2021 Poster: Online learning in MDPs with linear function approximation and bandit feedback. »
Gergely Neu · Iuliia Olkhovskaia -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Poster: Beating SGD Saturation with Tail-Averaging and Minibatching »
Nicole Muecke · Gergely Neu · Lorenzo Rosasco -
2019 Poster: Recovering Bandits »
Ciara Pike-Burke · Steffen Grünewälder -
2019 Spotlight: Recovering Bandits »
Ciara Pike-Burke · Steffen Grünewälder -
2017 Poster: Boltzmann Exploration Done Right »
Nicolò Cesa-Bianchi · Claudio Gentile · Gergely Neu · Gabor Lugosi -
2015 : Discussion Panel »
Tim van Erven · Wouter Koolen · Peter Grünwald · Shai Ben-David · Dylan Foster · Satyen Kale · Gergely Neu -
2015 : Adaptive Regret Bounds for Non-Stochastic Bandits »
Gergely Neu -
2015 Poster: Explore no more: Improved high-probability regret bounds for non-stochastic bandits »
Gergely Neu -
2014 Poster: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Efficient learning by implicit exploration in bandit problems with side observations »
Tomáš Kocák · Gergely Neu · Michal Valko · Remi Munos -
2014 Spotlight: Exploiting easy data in online optimization »
Amir Sani · Gergely Neu · Alessandro Lazaric -
2014 Poster: Online combinatorial optimization with stochastic decision sets and adversarial losses »
Gergely Neu · Michal Valko -
2013 Poster: Online learning in episodic Markovian decision processes by relative entropy policy search »
Alexander Zimin · Gergely Neu -
2010 Spotlight: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · András Antos · Csaba Szepesvari -
2010 Poster: Online Markov Decision Processes under Bandit Feedback »
Gergely Neu · András György · Csaba Szepesvari · András Antos