Timezone: »
Poster
(More) Efficient Reinforcement Learning via Posterior Sampling
Ian Osband · Daniel Russo · Benjamin Van Roy
Sun Dec 08 02:00 PM -- 06:00 PM (PST) @ Harrah's Special Events Center, 2nd Floor
Most provably efficient learning algorithms introduce optimism about poorly-understood states and actions to encourage exploration. We study an alternative approach for efficient exploration, posterior sampling for reinforcement learning (PSRL). This algorithm proceeds in repeated episodes of known duration. At the start of each episode, PSRL updates a prior distribution over Markov decision processes and takes one sample from this posterior. PSRL then follows the policy that is optimal for this sample during the episode. The algorithm is conceptually simple, computationally efficient and allows an agent to encode prior knowledge in a natural way. We establish an $\tilde{O}(\tau S \sqrt{AT} )$ bound on the expected regret, where $T$ is time, $\tau$ is the episode length and $S$ and $A$ are the cardinalities of the state and action spaces. This bound is one of the first for an algorithm not based on optimism and close to the state of the art for any reinforcement learning algorithm. We show through simulation that PSRL significantly outperforms existing algorithms with similar regret bounds.
Author Information
Ian Osband (DeepMind)
Daniel Russo (Columbia University)
Benjamin Van Roy (Stanford University)
More from the Same Authors
-
2021 : On Adaptivity and Confounding in Contextual Bandit Experiments »
Chao Qin · Daniel Russo -
2021 : On Adaptivity and Confounding in Contextual Bandit Experiments »
Chao Qin · Daniel Russo -
2022 : On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning »
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy -
2022 Poster: An Information-Theoretic Framework for Deep Learning »
Hong Jun Jeon · Benjamin Van Roy -
2022 Poster: The Neural Testbed: Evaluating Joint Predictions »
Ian Osband · Zheng Wen · Seyed Mohammad Asghari · Vikranth Dwaracherla · Xiuyuan Lu · MORTEZA IBRAHIMI · Dieterich Lawson · Botao Hao · Brendan O'Donoghue · Benjamin Van Roy -
2022 Poster: Temporally-Consistent Survival Analysis »
Lucas Maystre · Daniel Russo -
2022 Poster: Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Environment Capacity »
Benjamin Van Roy -
2021 : On Adaptivity and Confounding in Contextual Bandit Experiments »
Chao Qin · Daniel Russo -
2021 Poster: The Value of Information When Deciding What to Learn »
Dilip Arumugam · Benjamin Van Roy -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 : Reinforcement Learning Beyond Optimization »
Benjamin Van Roy -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Information-Theoretic Confidence Bounds for Reinforcement Learning »
Xiuyuan Lu · Benjamin Van Roy -
2019 Poster: Worst-Case Regret Bounds for Exploration via Randomized Value Functions »
Daniel Russo -
2018 Poster: An Information-Theoretic Analysis for Thompson Sampling with Many Actions »
Shi Dong · Benjamin Van Roy -
2018 Poster: Scalable Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Ian Osband · Benjamin Van Roy -
2018 Poster: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2018 Spotlight: Randomized Prior Functions for Deep Reinforcement Learning »
Ian Osband · John Aslanides · Albin Cassirer -
2017 Poster: Ensemble Sampling »
Xiuyuan Lu · Benjamin Van Roy -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2017 Poster: Improving the Expected Improvement Algorithm »
Chao Qin · Diego Klabjan · Daniel Russo -
2016 Poster: Deep Exploration via Bootstrapped DQN »
Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2014 Spotlight: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Model-based Reinforcement Learning and the Eluder Dimension »
Ian Osband · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Poster: Efficient Exploration and Value Function Generalization in Deterministic Systems »
Zheng Wen · Benjamin Van Roy -
2012 Poster: Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems »
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy -
2009 Poster: Directed Regression »
Yi-Hao Kao · Benjamin Van Roy · Xiang Yan