Skip to yearly menu bar Skip to main content


Provably Efficient Offline Reinforcement Learning in Regular Decision Processes

Roberto Cipollone · Anders Jonsson · Alessandro Ronca · Mohammad Sadegh Talebi

Great Hall & Hall B1+B2 (level 1) #1405
[ ]
[ Paper [ Poster [ OpenReview
Tue 12 Dec 8:45 a.m. PST — 10:45 a.m. PST

Abstract: This paper deals with offline (or batch) Reinforcement Learning (RL) in episodic Regular Decision Processes (RDPs). RDPs are the subclass of Non-Markov Decision Processes where the dependency on the history of past events can be captured by a finite-state automaton. We consider a setting where the automaton that underlies the RDP is unknown, and a learner strives to learn a near-optimal policy using pre-collected data, in the form of non-Markov sequences of observations, without further exploration. We present RegORL, an algorithm that suitably combines automata learning techniques and state-of-the-art algorithms for offline RL in MDPs. RegORL has a modular design allowing one to use any off-the-shelf offline RL algorithm in MDPs. We report a non-asymptotic high-probability sample complexity bound for RegORL to yield an $\varepsilon$-optimal policy, which makes appear a notion of concentrability relevant for RDPs. Furthermore, we present a sample complexity lower bound for offline RL in RDPs. To our best knowledge, this is the first work presenting a provably efficient algorithm for offline learning in RDPs.

Chat is not available.