Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

On Optimization Formulations of Finite Horizon MDPs

Rajat Vadiraj Dwaraknath · Lexing Ying


In this paper, we extend the connection between linear programming formulations of MDPs and policy gradient methods for infinite horizon MDPs presented in (Ying, L., & Zhu, Y., 2020) to finite horizon MDPs. The main tool we use for this extension is a reduction from optimization formulations of finite horizon MDPs to infinite horizon MDPs. Additionally, we show using a reparameterization argument that the KKT conditions for the non-convex policy optimization problem for the finite horizon setting are sufficient for global optimality. Further, we use the reduction to extend the Quasi-Newton policy gradient algorithm of (Li et. al 2021) to the finite horizon case and achieve performance competitive with value iteration by exploiting backward induction for policy evaluation. To our knowledge, this serves as the first policy gradient-based method for finite horizon MDPs that is competitive with value iteration-based approaches.

Chat is not available.