Timezone: »
In this paper, we establish a theoretical comparison between the asymptotic mean square errors of double Q-learning and Q-learning. Our result builds upon an analysis for linear stochastic approximation based on Lyapunov equations and applies to both tabular setting or with linear function approximation, provided that the optimal policy is unique and the algorithms converge. We show that the asymptotic mean-square error of Double Q-learning is exactly equal to that of Q-learning if Double Q-learning uses twice the learning rate of Q-learning and the output of Double Q-learning is the average of its two estimators. We also present some practical implications of this theoretical observation using simulations.
Author Information
Wentao Weng (Tsinghua University)
Harsh Gupta (University of Illinois at Urbana-Champaign)
Niao He (ETH Zurich)
Lei Ying (University of Michigan)
R. Srikant (University of Illinois at Urbana-Champaign)
More from the Same Authors
-
2023 : Model-Free, Regret-Optimal Best Policy Identification in Online CMDPs »
Zihan Zhou · Honghao Wei · Lei Ying -
2023 Poster: Fast and Regret Optimal Best Arm Identification: Fundamental Limits and Low-Complexity Algorithms »
Qining Zhang · Lei Ying -
2023 Poster: Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms »
Yashaswini Murthy · Mehrdad Moharrami · R. Srikant -
2023 Poster: Sample Efficient Reinforcement Learning in Mixed Systems through Augmented Samples and Its Applications to Queueing Networks »
Honghao Wei · Xin Liu · Weina Wang · Lei Ying -
2022 Spotlight: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2022 Poster: Minimax Regret for Cascading Bandits »
Daniel Vial · Sujay Sanghavi · Sanjay Shakkottai · R. Srikant -
2022 Poster: Online Convex Optimization with Hard Constraints: Towards the Best of Two Worlds and Beyond »
Hengquan Guo · Xin Liu · Honghao Wei · Lei Ying -
2022 Poster: Will Bilevel Optimizers Benefit from Loops »
Kaiyi Ji · Mingrui Liu · Yingbin Liang · Lei Ying -
2021 Poster: On the Bias-Variance-Cost Tradeoff of Stochastic Optimization »
Yifan Hu · Xin Chen · Niao He -
2021 Poster: An Efficient Pessimistic-Optimistic Algorithm for Stochastic Linear Bandits with General Constraints »
Xin Liu · Bin Li · Pengyi Shi · Lei Ying -
2020 Poster: Biased Stochastic First-Order Methods for Conditional Stochastic Optimization and Applications in Meta Learning »
Yifan Hu · Siqi Zhang · Xin Chen · Niao He -
2020 Poster: A Catalyst Framework for Minimax Optimization »
Junchi Yang · Siqi Zhang · Negar Kiyavash · Niao He -
2020 Poster: Global Convergence and Variance Reduction for a Class of Nonconvex-Nonconcave Minimax Problems »
Junchi Yang · Negar Kiyavash · Niao He -
2020 Poster: A Unified Switching System Perspective and Convergence Analysis of Q-Learning Algorithms »
Donghwan Lee · Niao He -
2020 Poster: The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models »
Yingxiang Yang · Negar Kiyavash · Le Song · Niao He -
2020 Spotlight: The Devil is in the Detail: A Framework for Macroscopic Prediction via Microscopic Models »
Yingxiang Yang · Negar Kiyavash · Le Song · Niao He -
2019 Poster: Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning »
Harsh Gupta · R. Srikant · Lei Ying -
2018 Poster: Adding One Neuron Can Eliminate All Bad Local Minima »
SHIYU LIANG · Ruoyu Sun · Jason Lee · R. Srikant -
2015 Poster: Algorithms with Logarithmic or Sublinear Regret for Constrained Contextual Bandits »
Huasen Wu · R. Srikant · Xin Liu · Chong Jiang