Timezone: »

Regularized Policy Iteration
Amir-massoud Farahmand · Mohammad Ghavamzadeh · Csaba Szepesvari · Shie Mannor

Tue Dec 09 07:30 PM -- 12:00 AM (PST) @

In this paper we consider approximate policy-iteration-based reinforcement learning algorithms. In order to implement a flexible function approximation scheme we propose the use of non-parametric methods with regularization, providing a convenient way to control the complexity of the function approximator. We propose two novel regularized policy iteration algorithms by adding L2-regularization to two widely-used policy evaluation methods: Bellman residual minimization (BRM) and least-squares temporal difference learning (LSTD). We derive efficient implementation for our algorithms when the approximate value-functions belong to a reproducing kernel Hilbert space. We also provide finite-sample performance bounds for our algorithms and show that they are able to achieve optimal rates of convergence under the studied conditions.

Author Information

Amir-massoud Farahmand (Vector Institute)
Mohammad Ghavamzadeh (Facebook AI Research)
Csaba Szepesvari (University of Alberta)
Shie Mannor (McGill University)

More from the Same Authors