Poster
Planning in entropy-regularized Markov decision processes and games
Jean-Bastien Grill · Omar Darwiche Domingues · Pierre Menard · Remi Munos · Michal Valko
East Exhibition Hall B, C #217
Keywords: [ Reinforcement Learning and Planning ] [ Reinforcement Learning and Planning -> Markov Decision Processes; Reinforcement Learning and Planning ] [ Planning; Reinforcemen ]
[
Abstract
]
Abstract:
We propose SmoothCruiser, a new planning algorithm for estimating the value function in entropy-regularized Markov decision processes and two-player games, given a generative model of the SmoothCruiser. SmoothCruiser makes use of the smoothness of the Bellman operator promoted by the regularization to achieve problem-independent sample complexity of order ˜O(1/ϵ4) for a desired accuracy ϵ, whereas for non-regularized settings there are no known algorithms with guaranteed polynomial sample complexity in the worst case.
Live content is unavailable. Log in and register to view live content