Skip to yearly menu bar Skip to main content


Poster

Deterministic Policies for Constrained Reinforcement Learning in Polynomial-Time

Jeremy McMahan

West Ballroom A-D #6909
[ ]
Wed 11 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

We present a novel algorithm that efficiently computes near-optimal deterministic policies for constrained reinforcement learning (CRL) problems. Our approach combines three key ideas: (1) value-demand augmentation, (2) action-space approximate dynamic programming, and (3) time-space rounding. Under mild reward assumptions, our algorithm constitutes a fully polynomial-time approximation scheme (FPTAS) for a diverse class of cost criteria. This class requires that the cost of a policy can be computed recursively over both time and (state) space, which includes classical expectation, almost sure, and anytime constraints. Our work not only provides provably efficient algorithms to address real-world challenges in decision-making but also offers a unifying theory for the efficient computation of constrained deterministic policies.

Live content is unavailable. Log in and register to view live content