Timezone: »
Poster
Near-optimal Reinforcement Learning in Factored MDPs
Ian Osband · Benjamin Van Roy
Any reinforcement learning algorithm that applies to all Markov decision processes (MDPs) will suffer $\Omega(\sqrt{SAT})$ regret on some MDP, where $T$ is the elapsed time and $S$ and $A$ are the cardinalities of the state and action spaces. This implies $T = \Omega(SA)$ time to guarantee a near-optimal policy. In many settings of practical interest, due to the curse of dimensionality, $S$ and $A$ can be so enormous that this learning time is unacceptable. We establish that, if the system is known to be a \emph{factored} MDP, it is possible to achieve regret that scales polynomially in the number of \emph{parameters} encoding the factored MDP, which may be exponentially smaller than $S$ or $A$. We provide two algorithms that satisfy near-optimal regret bounds in this context: posterior sampling reinforcement learning (PSRL) and an upper confidence bound algorithm (UCRL-Factored).
Author Information
Ian Osband (DeepMind)
Benjamin Van Roy (Stanford University)
Related Events (a corresponding poster, oral, or spotlight)
-
2014 Spotlight: Near-optimal Reinforcement Learning in Factored MDPs »
Wed. Dec 10th 08:30 -- 08:50 PM Room Level 2, room 210
More from the Same Authors
-
2022 : On Rate-Distortion Theory in Capacity-Limited Cognition & Reinforcement Learning »
Dilip Arumugam · Mark Ho · Noah Goodman · Benjamin Van Roy -
2022 Poster: An Information-Theoretic Framework for Deep Learning »
Hong Jun Jeon · Benjamin Van Roy -
2022 Poster: Deciding What to Model: Value-Equivalent Sampling for Reinforcement Learning »
Dilip Arumugam · Benjamin Van Roy -
2021 : Environment Capacity »
Benjamin Van Roy -
2021 Poster: The Value of Information When Deciding What to Learn »
Dilip Arumugam · Benjamin Van Roy -
2019 : Reinforcement Learning Beyond Optimization »
Benjamin Van Roy -
2019 Poster: Information-Theoretic Confidence Bounds for Reinforcement Learning »
Xiuyuan Lu · Benjamin Van Roy -
2018 Poster: An Information-Theoretic Analysis for Thompson Sampling with Many Actions »
Shi Dong · Benjamin Van Roy -
2018 Poster: Scalable Coordinated Exploration in Concurrent Reinforcement Learning »
Maria Dimakopoulou · Ian Osband · Benjamin Van Roy -
2017 Poster: Ensemble Sampling »
Xiuyuan Lu · Benjamin Van Roy -
2017 Poster: Conservative Contextual Linear Bandits »
Abbas Kazerouni · Mohammad Ghavamzadeh · Yasin Abbasi · Benjamin Van Roy -
2016 Poster: Deep Exploration via Bootstrapped DQN »
Ian Osband · Charles Blundell · Alexander Pritzel · Benjamin Van Roy -
2014 Workshop: Large-scale reinforcement learning and Markov decision problems »
Benjamin Van Roy · Mohammad Ghavamzadeh · Peter Bartlett · Yasin Abbasi Yadkori · Ambuj Tewari -
2014 Poster: Learning to Optimize via Information-Directed Sampling »
Daniel Russo · Benjamin Van Roy -
2014 Poster: Model-based Reinforcement Learning and the Eluder Dimension »
Ian Osband · Benjamin Van Roy -
2013 Poster: (More) Efficient Reinforcement Learning via Posterior Sampling »
Ian Osband · Daniel Russo · Benjamin Van Roy -
2013 Poster: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Oral: Eluder Dimension and the Sample Complexity of Optimistic Exploration »
Daniel Russo · Benjamin Van Roy -
2013 Poster: Efficient Exploration and Value Function Generalization in Deterministic Systems »
Zheng Wen · Benjamin Van Roy -
2012 Poster: Efficient Reinforcement Learning for High Dimensional Linear Quadratic Systems »
Morteza Ibrahimi · Adel Javanmard · Benjamin Van Roy -
2009 Poster: Directed Regression »
Yi-Hao Kao · Benjamin Van Roy · Xiang Yan