Timezone: »
Poster
Model-based Reinforcement Learning and the Eluder Dimension
Ian Osband · Benjamin Van Roy
We consider the problem of learning to optimize an unknown Markov decision process (MDP). We show that, if the MDP can be parameterized within some known function class, we can obtain regret bounds that scale with the dimensionality, rather than cardinality, of the system. We characterize this dependence explicitly as $\tilde{O}(\sqrt{d_K d_E T})$ where $T$ is time elapsed, $d_K$ is the Kolmogorov dimension and $d_E$ is the \emph{eluder dimension}. These represent the first unified regret bounds for model-based reinforcement learning and provide state of the art guarantees in several important settings. Moreover, we present a simple and computationally efficient algorithm \emph{posterior sampling for reinforcement learning} (PSRL) that satisfies these bounds.
Author Information
Ian Osband (DeepMind)
Benjamin Van Roy (Stanford University)
More from the Same Authors
-
2022 : On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning »
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy -
2022 Poster: An Information-Theoretic Framework for Deep Learning »
Hong Jun Jeon · Benjamin Van Roy -
2022 Poster: Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Environment Capacity »
Benjamin Van Roy -
2021 Poster: The Value of Information When Deciding What to Learn »
Dilip Arumugam · Benjamin Van Roy -
2019 : Reinforcement Learning Beyond Optimization »
Benjamin Van Roy -
2019 Poster: Information-Theoretic Confidence Bounds for Reinforcement Learning »
Xiuyuan Lu · Benjamin Van Roy -
2018 Poster: An Information-Theoretic Analysis for Thompson Sampling with Many Actions »
Shi Dong · Benjamin Van Roy -
2018 Poster: Scalable Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Ian Osband · Benjamin Van Roy -
2017 Poster: Ensemble Sampling »
Xiuyuan Lu · Benjamin Van Roy -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2016 Poster: Deep Exploration via Bootstrapped DQN »
Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2014 Spotlight: Near-optimal Reinforcement Learning in Factored MDPs »
Ian Osband · Benjamin Van Roy -
2013 Poster: (More) Efficient Reinforcement Learning via Posterior Sampling »
Ian Osband · Daniel Russo · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Poster: Efficient Exploration and Value Function Generalization in Deterministic Systems »
Zheng Wen · Benjamin Van Roy -
2012 Poster: Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems »
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy -
2009 Poster: Directed Regression »
Yi-Hao Kao · Benjamin Van Roy · Xiang Yan