Skip to yearly menu bar Skip to main content


Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems

Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun

Hall J (level 1) #518

Keywords: [ Provably efficient RL ] [ PAC RL ] [ Reinforcement Learning Theory ] [ POMDPs ]


We study Reinforcement Learning for partially observable systems using function approximation. We propose a new PO-bilinear framework, that is general enough to include models such as undercomplete tabular Partially Observable Markov Decision Processes (POMDPs), Linear Quadratic Gaussian (LQG), Predictive State Representations (PSRs), as well as a newly introduced model Hilbert Space Embeddings of POMDPs. Under this framework, we propose an actor-critic style algorithm that is capable to performing agnostic policy learning. Given a policy class that consists of memory based policies (i.e., policy that looks at a fixed-length window of recent observations), and a value function class that consists of functions taking both memory and future observations as inputs, our algorithm learns to compete against the best memory-based policy among the policy class. For certain examples such as undercomplete POMDPs and LQGs, by leveraging their special properties, our algorithm is even capable of competing against the globally optimal policy without paying an exponential dependence on the horizon.

Chat is not available.