Timezone: »
Poster
Sample Complexity of Asynchronous Q-Learning: Sharper Analysis and Variance Reduction
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen
Asynchronous Q-learning aims to learn the optimal action-value function (or Q-function) of a Markov decision process (MDP), based on a single trajectory of Markovian samples induced by a behavior policy. Focusing on a $\gamma$-discounted MDP with state space S and action space A, we demonstrate that the $ \ell_{\infty} $-based sample complexity of classical asynchronous Q-learning --- namely, the number of samples needed to yield an entrywise $\epsilon$-accurate estimate of the Q-function --- is at most on the order of $ \frac{1}{ \mu_{\min}(1-\gamma)^5 \epsilon^2 }+ \frac{ t_{\mathsf{mix}} }{ \mu_{\min}(1-\gamma) } $ up to some logarithmic factor, provided that a proper constant learning rate is adopted. Here, $ t_{\mathsf{mix}} $ and $ \mu_{\min} $ denote respectively the mixing time and the minimum state-action occupancy probability of the sample trajectory. The first term of this bound matches the complexity in the case with independent samples drawn from the stationary distribution of the trajectory. The second term reflects the expense taken for the empirical distribution of the Markovian trajectory to reach a steady state, which is incurred at the very beginning and becomes amortized as the algorithm runs. Encouragingly, the above bound improves upon the state-of-the-art result by a factor of at least |S||A|. Further, the scaling on the discount complexity can be improved by means of variance reduction.
Author Information
Gen Li (Tsinghua University)
Yuting Wei (Carnegie Mellon University)
Yuejie Chi (CMU)
Yuantao Gu (Tsinghua University)
Yuxin Chen (Princeton University)
More from the Same Authors
-
2021 Spotlight: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2021 : DESTRESS: Computation-Optimal and Communication-Efficient Decentralized Nonconvex Finite-Sum Optimization »
Boyue Li · Zhize Li · Yuejie Chi -
2021 : Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence »
Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi -
2021 : Policy Mirror Descent for Regularized RL: A Generalized Framework with Linear Convergence »
Wenhao Zhan · Shicong Cen · Baihe Huang · Yuxin Chen · Jason Lee · Yuejie Chi -
2022 : A Multi-Token Coordinate Descent Method for Vertical Federated Learning »
Pedro Valdeira · Yuejie Chi · Claudia Soares · Joao Xavier -
2022 Poster: BEER: Fast $O(1/T)$ Rate for Decentralized Nonconvex Optimization with Communication Compression »
Haoyu Zhao · Boyue Li · Zhize Li · Peter Richtarik · Yuejie Chi -
2022 Poster: Minimax-Optimal Multi-Agent RL in Markov Games With a Generative Model »
Gen Li · Yuejie Chi · Yuting Wei · Yuxin Chen -
2022 Poster: SoteriaFL: A Unified Framework for Private Federated Learning with Communication Compression »
Zhize Li · Haoyu Zhao · Boyue Li · Yuejie Chi -
2021 Poster: Fast Policy Extragradient Methods for Competitive Games with Entropy Regularization »
Shicong Cen · Yuting Wei · Yuejie Chi -
2021 Poster: Breaking the Sample Complexity Barrier to Regret-Optimal Model-Free Reinforcement Learning »
Gen Li · Laixi Shi · Yuxin Chen · Yuantao Gu · Yuejie Chi -
2021 Poster: Sample-Efficient Reinforcement Learning Is Feasible for Linearly Realizable MDPs with Limited Revisiting »
Gen Li · Yuxin Chen · Yuejie Chi · Yuantao Gu · Yuting Wei -
2020 Poster: Randomized tests for high-dimensional regression: A more efficient and powerful solution »
Yue Li · Ilmun Kim · Yuting Wei -
2020 Poster: Breaking the Sample Size Barrier in Model-Based Reinforcement Learning with a Generative Model »
Gen Li · Yuting Wei · Yuejie Chi · Yuantao Gu · Yuxin Chen -
2019 Poster: Nonconvex Low-Rank Symmetric Tensor Completion from Noisy Data »
Changxiao Cai · Gen Li · H. Vincent Poor · Yuxin Chen