Timezone: »

Budgeted Reinforcement Learning in Continuous State Space
Nicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin

Wed Dec 11 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #202

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.

Author Information

Nicolas Carrara (ULille)
Edouard Leurent (INRIA)

PhD student in Reinforcement Learning, at: - INRIA SequeL project for sequential learning - INRIA Non-A project for finite-time control - Renault Group

Romain Laroche (Microsoft Research)
Tanguy Urvoy (Orange-Labs)
Odalric-Ambrym Maillard (INRIA)
Olivier Pietquin (Google Research Brain Team)

More from the Same Authors