Skip to yearly menu bar Skip to main content


Poster

Planning in entropy-regularized Markov decision processes and games

Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko

East Exhibition Hall B + C #217

Keywords: [ Planning; Reinforcemen ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning ]


Abstract: We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order $\tilde{\mathcal{O}}(1/\epsilon^4)$ for a desired accuracy $\epsilon$, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.

Live content is unavailable. Log in and register to view live content