Timezone: »
Poster
The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition
Tiancheng Jin · Longbo Huang · Haipeng Luo
We consider the best-of-both-worlds problem for learning an episodic Markov Decision Process through $T$ episodes, with the goal of achieving $\widetilde{\mathcal{O}}(\sqrt{T})$ regret when the losses are adversarial and simultaneously $\mathcal{O}(\log T)$ regret when the losses are (almost) stochastic. Recent work by [Jin and Luo, 2020] achieves this goal when the fixed transition is known, and leaves the case of unknown transition as a major open question. In this work, we resolve this open problem by using the same Follow-the-Regularized-Leader (FTRL) framework together with a set of new techniques. Specifically, we first propose a loss-shifting trick in the FTRL analysis, which greatly simplifies the approach of [Jin and Luo, 2020] and already improves their results for the known transition case. Then, we extend this idea to the unknown transition case and develop a novel analysis which upper bounds the transition estimation error by the regret itself in the stochastic setting, a key property to ensure $\mathcal{O}(\log T)$ regret.
Author Information
Tiancheng Jin (University of Southern California)
Longbo Huang (IIIS, Tsinghua Univeristy)
Haipeng Luo (University of Southern California)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Oral: The best of both worlds: stochastic and adversarial episodic MDPs with unknown transition »
Tue. Dec 7th 09:40 -- 09:55 AM Room
More from the Same Authors
-
2021 : Plan Better Amid Conservatism: Offline Multi-Agent Reinforcement Learning with Actor Rectification »
Ling Pan · Longbo Huang · Tengyu Ma · Huazhe Xu -
2021 Poster: Multi-Agent Reinforcement Learning in Stochastic Networked Systems »
Yiheng Lin · Guannan Qu · Longbo Huang · Adam Wierman -
2021 Poster: Last-iterate Convergence in Extensive-Form Games »
Chung-Wei Lee · Christian Kroer · Haipeng Luo -
2021 Poster: Continuous Mean-Covariance Bandits »
Yihan Du · Siwei Wang · Zhixuan Fang · Longbo Huang -
2021 Poster: Implicit Finite-Horizon Approximation and Efficient Optimal Algorithms for Stochastic Shortest Path »
Liyu Chen · Mehdi Jafarnia-Jahromi · Rahul Jain · Haipeng Luo -
2021 Poster: Fast Federated Learning in the Presence of Arbitrary Device Unavailability »
Xinran Gu · Kaixuan Huang · Jingzhao Zhang · Longbo Huang -
2021 Poster: What Makes Multi-Modal Learning Better than Single (Provably) »
Yu Huang · Chenzhuang Du · Zihui Xue · Xuanyao Chen · Hang Zhao · Longbo Huang -
2021 Poster: Regularized Softmax Deep Multi-Agent Q-Learning »
Ling Pan · Tabish Rashid · Bei Peng · Longbo Huang · Shimon Whiteson -
2021 Poster: Policy Optimization in Adversarial MDPs: Improved Exploration via Dilated Bonuses »
Haipeng Luo · Chen-Yu Wei · Chung-Wei Lee -
2020 Poster: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Spotlight: Simultaneously Learning Stochastic and Adversarial Episodic MDPs with Known Transition »
Tiancheng Jin · Haipeng Luo -
2020 Oral: Bias no more: high-probability data-dependent regret bounds for adversarial bandits and MDPs »
Chung-Wei Lee · Haipeng Luo · Chen-Yu Wei · Mengxiao Zhang -
2020 Poster: Softmax Deep Double Deterministic Policy Gradients »
Ling Pan · Qingpeng Cai · Longbo Huang -
2020 Poster: Restless-UCB, an Efficient and Low-complexity Algorithm for Online Restless Bandits »
Siwei Wang · Longbo Huang · John C. S. Lui -
2020 Poster: Comparator-Adaptive Convex Bandits »
Dirk van der Hoeven · Ashok Cutkosky · Haipeng Luo -
2019 Poster: Equipping Experts/Bandits with Long-term Memory »
Kai Zheng · Haipeng Luo · Ilias Diakonikolas · Liwei Wang -
2019 Poster: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Spotlight: Model Selection for Contextual Bandits »
Dylan Foster · Akshay Krishnamurthy · Haipeng Luo -
2019 Poster: Double Quantization for Communication-Efficient Distributed Optimization »
Yue Yu · Jiaxiang Wu · Longbo Huang -
2019 Poster: Hypothesis Set Stability and Generalization »
Dylan Foster · Spencer Greenberg · Satyen Kale · Haipeng Luo · Mehryar Mohri · Karthik Sridharan -
2018 Poster: Multi-armed Bandits with Compensation »
Siwei Wang · Longbo Huang