Timezone: »
Poster
Finite-Time Analysis for Double Q-learning
Huaqing Xiong · Lin Zhao · Yingbin Liang · Wei Zhang
Although Q-learning is one of the most successful algorithms for finding the best action-value function (and thus the optimal policy) in reinforcement learning, its implementation often suffers from large overestimation of Q-function values incurred by random sampling. The double Q-learning algorithm proposed in~\citet{hasselt2010double} overcomes such an overestimation issue by randomly switching the update between two Q-estimators, and has thus gained significant popularity in practice. However, the theoretical understanding of double Q-learning is rather limited. So far only the asymptotic convergence has been established, which does not characterize how fast the algorithm converges. In this paper, we provide the first non-asymptotic (i.e., finite-time) analysis for double Q-learning. We show that both synchronous and asynchronous double Q-learning are guaranteed to converge to an $\epsilon$-accurate neighborhood of the global optimum by taking
$\tilde{\Omega}\left(\left( \frac{1}{(1-\gamma)^6\epsilon^2}\right)^{\frac{1}{\omega}} +\left(\frac{1}{1-\gamma}\right)^{\frac{1}{1-\omega}}\right)$ iterations, where $\omega\in(0,1)$ is the decay parameter of the learning rate, and $\gamma$ is the discount factor. Our analysis develops novel techniques to derive finite-time bounds on the difference between two inter-connected stochastic processes, which is new to the literature of stochastic approximation.
Author Information
Huaqing Xiong (Ohio State University)
Lin Zhao (National University of Singapore)
Yingbin Liang (The Ohio State University)
Wei Zhang (Southern University of Science and Technology)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Finite-Time Analysis for Double Q-learning »
Tue. Dec 8th 03:20 -- 03:30 PM Room Orals & Spotlights: Reinforcement Learning
More from the Same Authors
-
2021 Spotlight: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang -
2022 Poster: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 : Online Min-max Optimization: Nonconvexity, Nonstationarity, and Dynamic Regret »
Yu Huang · Yuan Cheng · Yingbin Liang · Longbo Huang -
2022 Spotlight: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2022 Spotlight: Lightning Talks 3B-2 »
Yu Huang · Tero Karras · Maxim Kodryan · Shiau Hong Lim · Shudong Huang · Ziyu Wang · Siqiao Xue · ILYAS MALIK · Ekaterina Lobacheva · Miika Aittala · Hongjie Wu · Yuhao Zhou · Yingbin Liang · Xiaoming Shi · Jun Zhu · Maksim Nakhodnov · Timo Aila · Yazhou Ren · James Zhang · Longbo Huang · Dmitry Vetrov · Ivor Tsang · Hongyuan Mei · Samuli Laine · Zenglin Xu · Wentao Feng · Jiancheng Lv -
2022 Spotlight: Provable Generalization of Overparameterized Meta-learning Trained with SGD »
Yu Huang · Yingbin Liang · Longbo Huang -
2022 Spotlight: Lightning Talks 1A-3 »
Kimia Noorbakhsh · Ronan Perry · Qi Lyu · Jiawei Jiang · Christian Toth · Olivier Jeunen · Xin Liu · Yuan Cheng · Lei Li · Manuel Rodriguez · Julius von Kügelgen · Lars Lorch · Nicolas Donati · Lukas Burkhalter · Xiao Fu · Zhongdao Wang · Songtao Feng · Ciarán Gilligan-Lee · Rishabh Mehrotra · Fangcheng Fu · Jing Yang · Bernhard Schölkopf · Ya-Li Li · Christian Knoll · Maks Ovsjanikov · Andreas Krause · Shengjin Wang · Hong Zhang · Mounia Lalmas · Bolin Ding · Bo Du · Yingbin Liang · Franz Pernkopf · Robert Peharz · Anwar Hithnawi · Julius von Kügelgen · Bo Li · Ce Zhang -
2022 Spotlight: Provable Benefit of Multitask Representation Learning in Reinforcement Learning »
Yuan Cheng · Songtao Feng · Jing Yang · Hong Zhang · Yingbin Liang -
2022 Poster: A Unifying Framework of Off-Policy General Value Function Evaluation »
Tengyu Xu · Zhuoran Yang · Zhaoran Wang · Yingbin Liang -
2022 Poster: On the Convergence Theory for Hessian-Free Bilevel Algorithms »
Daouda Sow · Kaiyi Ji · Yingbin Liang -
2022 Poster: Provable Benefit of Multitask Representation Learning in Reinforcement Learning »
Yuan Cheng · Songtao Feng · Jing Yang · Hong Zhang · Yingbin Liang -
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2021 Poster: Faster Non-asymptotic Convergence for Double Q-learning »
Lin Zhao · Huaqing Xiong · Yingbin Liang -
2021 Poster: Provably Faster Algorithms for Bilevel Optimization »
Junjie Yang · Kaiyi Ji · Yingbin Liang -
2020 Poster: Convergence of Meta-Learning with Task-Specific Adaptation over Partial Parameters »
Kaiyi Ji · Jason Lee · Yingbin Liang · H. Vincent Poor -
2020 Poster: Improving Sample Complexity Bounds for (Natural) Actor-Critic Algorithms »
Tengyu Xu · Zhe Wang · Yingbin Liang -
2019 Poster: SpiderBoost and Momentum: Faster Variance Reduction Algorithms »
Zhe Wang · Kaiyi Ji · Yi Zhou · Yingbin Liang · Vahid Tarokh -
2019 Poster: Finite-Sample Analysis for SARSA with Linear Function Approximation »
Shaofeng Zou · Tengyu Xu · Yingbin Liang -
2019 Poster: Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples »
Tengyu Xu · Shaofeng Zou · Yingbin Liang -
2018 Poster: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang -
2018 Spotlight: Convergence of Cubic Regularization for Nonconvex Optimization under KL Property »
Yi Zhou · Zhe Wang · Yingbin Liang -
2018 Poster: Minimax Estimation of Neural Net Distance »
Kaiyi Ji · Yingbin Liang