Timezone: »

Variance Reduced Policy Evaluation with Smooth Function Approximation
Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang

Wed Dec 11 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #213
Policy evaluation with smooth and nonlinear function approximation has shown great potential for reinforcement learning. Compared to linear function approxi- mation, it allows for using a richer class of approximation functions such as the neural networks. Traditional algorithms are based on two timescales stochastic approximation whose convergence rate is often slow. This paper focuses on an offline setting where a trajectory of $m$ state-action pairs are observed. We formulate the policy evaluation problem as a non-convex primal-dual, finite-sum optimization problem, whose primal sub-problem is non-convex and dual sub-problem is strongly concave. We suggest a single-timescale primal-dual gradient algorithm with variance reduction, and show that it converges to an $\epsilon$-stationary point using $O(m/\epsilon)$ calls (in expectation) to a gradient oracle.

Author Information

Hoi-To Wai (The Chinese University of Hong Kong)
Mingyi Hong (University of Minnesota)
Zhuoran Yang (Princeton University)
Zhaoran Wang (Northwestern University)
Kexin Tang (Shanghai Jiao Tong University)

More from the Same Authors