Poster

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill ⋅ Omar Darwiche Domingues ⋅ Pierre Menard ⋅ Remi Munos ⋅ Michal Valko

Keywords: Reinforcement Learning and Planning Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning Planning; Reinforcemen

2019 Poster

[ Paper] [ Poster]

Abstract

We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Chat is not available.