Timezone: »

Policy gradient finds global optimum of nearly linear-quadratic control systems
Yinbin Han · Meisam Razaviyayn · Renyuan Xu
Event URL: https://openreview.net/forum?id=jXQOe5r0O3u »

We explore reinforcement learning methods for finding the optimal policy in the nearly linear-quadratic control systems. In particular, we consider a dynamic system composed of the summation of a linear and a nonlinear components, which is governed by a policy with the same structure. Assuming that the nonlinear part consists of kernels with small Lipschitz coefficients, we characterize the optimization landscape of the cost function. While the resulting landscape is generally nonconvex, we show local strong convexity and smoothness of the cost function around the global optimizer. In addition, we design a policy gradient algorithm with a carefully chosen initialization and prove that the algorithm is guaranteed to converge to the globally optimal policy with a linear rate.

Author Information

Yinbin Han (University of Southern California)
Meisam Razaviyayn (University of Southern California)
Renyuan Xu (University of Southern California)

More from the Same Authors