Timezone: »

 
Poster
Finite-Sample Analysis for SARSA with Linear Function Approximation
Shaofeng Zou · Tengyu Xu · Yingbin Liang

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #201

SARSA is an on-policy algorithm to learn a Markov decision process policy in reinforcement learning. We investigate the SARSA algorithm with linear function approximation under the non-i.i.d.\ setting, where a single sample trajectory is available. With a Lipschitz continuous policy improvement operator that is smooth enough, SARSA has been shown to converge asymptotically. However, its non-asymptotic analysis is challenging and remains unsolved due to the non-i.i.d. samples, and the fact that the behavior policy changes dynamically with time. In this paper, we develop a novel technique to explicitly characterize the stochastic bias of a type of stochastic approximation procedures with time-varying Markov transition kernels. Our approach enables non-asymptotic convergence analyses of this type of stochastic approximation algorithms, which may be of independent interest. Using our bias characterization technique and a gradient descent type of analysis, we further provide the finite-sample analysis on the mean square error of the SARSA algorithm. In the end, we present a fitted SARSA algorithm, which includes the original SARSA algorithm and its variant as special cases. This fitted SARSA algorithm provides a framework for \textit{iterative} on-policy fitted policy iteration, which is more memory and computationally efficient. For this fitted SARSA algorithm, we also present its finite-sample analysis.

Author Information

Shaofeng Zou (University at Buffalo, the State University of New York)
Tengyu Xu (The Ohio State University)
Yingbin Liang (The Ohio State University)

More from the Same Authors