Timezone: »
Poster
Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs
Jian QIAN · Ronan Fruit · Matteo Pirotta · Alessandro Lazaric
Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #182
The exploration bonus is an effective approach to manage the exploration-exploitation trade-off in Markov Decision Processes (MDPs).
While it has been analyzed in infinite-horizon discounted and finite-horizon problems, we focus on designing and analysing the exploration bonus in the more challenging infinite-horizon undiscounted setting.
We first introduce SCAL+, a variant of SCAL (Fruit et al. 2018), that uses a suitable exploration bonus to solve any discrete unknown weakly-communicating MDP for which an upper bound $c$ on the span of the optimal bias function is known. We prove that SCAL+ enjoys the same regret guarantees as SCAL, which relies on the less efficient extended value iteration approach.
Furthermore, we leverage the flexibility provided by the exploration bonus scheme to generalize SCAL+ to smooth MDPs with continuous state space and discrete actions. We show that the resulting algorithm (SCCAL+) achieves the same regret bound as UCCRL (Ortner and Ryabko, 2012) while being the first implementable algorithm for this setting.
Author Information
Jian QIAN (INRIA Lille - Sequel Team)
Ronan Fruit (Inria Lille)
Matteo Pirotta (Facebook AI Research)
Alessandro Lazaric (Facebook Artificial Intelligence Research)
More from the Same Authors
-
2021 Spotlight: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Spotlight: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2022 Poster: Scalable Representation Learning in Linear Contextual Bandits with Constant Regret Guarantees »
Andrea Tirinzoni · Matteo Papini · Ahmed Touati · Alessandro Lazaric · Matteo Pirotta -
2021 Poster: Local Differential Privacy for Regret Minimization in Reinforcement Learning »
Evrard Garcelon · Vianney Perchet · Ciara Pike-Burke · Matteo Pirotta -
2021 Poster: Stochastic Shortest Path: Minimax, Parameter-Free and Towards Horizon-Free Regret »
Jean Tarbouriech · Runlong Zhou · Simon Du · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: A Provably Efficient Sample Collection Strategy for Reinforcement Learning »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2021 Poster: Reinforcement Learning in Linear MDPs: Constant Regret and Representation Selection »
Matteo Papini · Andrea Tirinzoni · Aldo Pacchiano · Marcello Restelli · Alessandro Lazaric · Matteo Pirotta -
2020 Poster: An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits »
Andrea Tirinzoni · Matteo Pirotta · Marcello Restelli · Alessandro Lazaric -
2020 Poster: Adversarial Attacks on Linear Contextual Bandits »
Evrard Garcelon · Baptiste Roziere · Laurent Meunier · Jean Tarbouriech · Olivier Teytaud · Alessandro Lazaric · Matteo Pirotta -
2020 Poster: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2020 Poster: Provably Efficient Reward-Agnostic Navigation with Linear Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2020 Oral: Improved Sample Complexity for Incremental Autonomous Exploration in MDPs »
Jean Tarbouriech · Matteo Pirotta · Michal Valko · Alessandro Lazaric -
2019 : Poster Session »
Ahana Ghosh · Javad Shafiee · Akhilan Boopathy · Alex Tamkin · Theodoros Vasiloudis · Vedant Nanda · Ali Baheri · Paul Fieguth · Andrew Bennett · Guanya Shi · Hao Liu · Arushi Jain · Jacob Tyo · Benjie Wang · Boxiao Chen · Carroll Wainwright · Chandramouli Shama Sastry · Chao Tang · Daniel S. Brown · David Inouye · David Venuto · Dhruv Ramani · Dimitrios Diochnos · Divyam Madaan · Dmitrii Krashenikov · Joel Oren · Doyup Lee · Eleanor Quint · elmira amirloo · Matteo Pirotta · Gavin Hartnett · Geoffroy Dubourg-Felonneau · Gokul Swamy · Pin-Yu Chen · Ilija Bogunovic · Jason Carter · Javier Garcia-Barcos · Jeet Mohapatra · Jesse Zhang · Jian Qian · John Martin · Oliver Richter · Federico Zaiter · Tsui-Wei Weng · Karthik Abinav Sankararaman · Kyriakos Polymenakos · Lan Hoang · mahdieh abbasi · Marco Gallieri · Mathieu Seurin · Matteo Papini · Matteo Turchetta · Matthew Sotoudeh · Mehrdad Hosseinzadeh · Nathan Fulton · Masatoshi Uehara · Niranjani Prasad · Oana-Maria Camburu · Patrik Kolaric · Philipp Renz · Prateek Jaiswal · Reazul Hasan Russel · Riashat Islam · Rishabh Agarwal · Alexander Aldrick · Sachin Vernekar · Sahin Lale · Sai Kiran Narayanaswami · Samuel Daulton · Sanjam Garg · Sebastian East · Shun Zhang · Soheil Dsidbari · Justin Goodwin · Victoria Krakovna · Wenhao Luo · Wesley Chung · Yuanyuan Shi · Yuh-Shyang Wang · Hongwei Jin · Ziping Xu -
2019 Poster: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2019 Spotlight: A Structured Prediction Approach for Generalization in Cooperative Multi-Agent Reinforcement Learning »
Nicolas Carion · Nicolas Usunier · Gabriel Synnaeve · Alessandro Lazaric -
2019 Poster: Limiting Extrapolation in Linear Approximate Value Iteration »
Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill -
2019 Poster: Regret Bounds for Learning State Representations in Reinforcement Learning »
Ronald Ortner · Matteo Pirotta · Alessandro Lazaric · Ronan Fruit · Odalric-Ambrym Maillard -
2018 Poster: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2018 Spotlight: Near Optimal Exploration-Exploitation in Non-Communicating Markov Decision Processes »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric -
2017 : Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 : Spotlights & Poster Session »
David Abel · Nicholas Denis · Maria Eckstein · Ronan Fruit · Karan Goel · Joshua Gruenstein · Anna Harutyunyan · Martin Klissarov · Xiangyu Kong · Aviral Kumar · Saurabh Kumar · Miao Liu · Daniel McNamee · Shayegan Omidshafiei · Silviu Pitis · Paulo Rauber · Melrose Roderick · Tianmin Shu · Yizhou Wang · Shangtong Zhang -
2017 Poster: Compatible Reward Inverse Reinforcement Learning »
Alberto Maria Metelli · Matteo Pirotta · Marcello Restelli -
2017 Poster: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2017 Poster: Adaptive Batch Size for Safe Policy Gradients »
Matteo Papini · Matteo Pirotta · Marcello Restelli -
2017 Spotlight: Regret Minimization in MDPs with Options without Prior Knowledge »
Ronan Fruit · Matteo Pirotta · Alessandro Lazaric · Emma Brunskill -
2013 Poster: Adaptive Step-Size for Policy Gradient Methods »
Matteo Pirotta · Marcello Restelli · Luca Bascetta