Timezone: »

Polynomial Time Reinforcement Learning in Factored State MDPs with Linear Value Functions
Siddartha Devic · Zihao Deng · Brendan Juba

Many reinforcement learning (RL) environments in practice feature enormous state spaces that may be described compactly by a ``factored'' structure, that may be modeled by Factored Markov Decision Processes (FMDPs). We present the first polynomial-time algorithm for RL in Factored State MDPs (generalizing FMDPs) that neither relies on an oracle planner nor requires a linear transition model; it only requires a linear value function with a suitable local basis with respect to the factorization, permitting efficient variable elimination. With this assumption, we can solve this family of Factored State MDPs in polynomial time by constructing an efficient separation oracle for convex optimization. Importantly, and in contrast to prior work on FMDPs, we do not assume that the transitions on various factors are conditionally independent.

Author Information

Siddartha Devic (University of Southern California)
Zihao Deng (Washington University, St. Louis)
Brendan Juba (Washington University in St. Louis)

More from the Same Authors

  • 2019 : Poster Spotlights A (23 posters) »
    DongHa Bahn · Xiaoran Xu · Shih-Chieh Su · Daniel Cunnington · Wonseok Hwang · Sarthak Dash · Alberto Camacho · Theodoros Salonidis · Shiyang Li · Yuyu Zhang · Habibeh Naderi · Zhe Zeng · Pasha Khosravi · Pedro Colon-Hernandez · Dimitris Diochnos · David Windridge · Robin Manhaeve · Vaishak Belle · Brendan Juba · Naveen Sundar Govindarajulu · Joe Bockhorst
  • 2019 Poster: Implicitly learning to reason in first-order logic »
    Vaishak Belle · Brendan Juba
  • 2017 : Poster Session - Session 2 »
    Ambrish Rawat · Armand Joulin · Peter A Jansen · Jay Yoon Lee · Muhao Chen · Frank F. Xu · Patrick Verga · Brendan Juba · Anca Dumitrache · Sharmistha Jat · Robert Logan · Dhanya Sridhar · Fan Yang · Rajarshi Das · Pouya Pezeshkpour · Nicholas Monath