Timezone: »

Reinforcement Learning in Factored MDPs: Oracle-Efficient Algorithms and Tighter Regret Bounds for the Non-Episodic Setting
Ziping Xu · Ambuj Tewari

Tue Dec 08 09:00 AM -- 11:00 AM (PST) @ Poster Session 1 #495
We study reinforcement learning in non-episodic factored Markov decision processes (FMDPs). We propose two near-optimal and oracle-efficient algorithms for FMDPs. Assuming oracle access to an FMDP planner, they enjoy a Bayesian and a frequentist regret bound respectively, both of which reduce to the near-optimal bound $O(DS\sqrt{AT})$ for standard non-factored MDPs. We propose a tighter connectivity measure, factored span, for FMDPs and prove a lower bound that depends on the factored span rather than the diameter $D$. In order to decrease the gap between lower and upper bounds, we propose an adaptation of the REGAL.C algorithm whose regret bound depends on the factored span. Our oracle-efficient algorithms outperform previously proposed near-optimal algorithms on computer network administration simulations.

Author Information

Ziping Xu (University of Michigan)

My name is Ziping Xu. I am a fifth-year Ph.D. student in Statistics at the University of Michigan. My research interests are on sample efficient reinforcement learning and transfer learning, multitask learning. I am looking for research-orientated full-time job starting Fall 2023

Ambuj Tewari (University of Michigan)

More from the Same Authors