Timezone: »
Poster
Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition
Zihan Zhang · Yuan Zhou · Xiangyang Ji
We study the reinforcement learning problem in the setting of finite-horizon1episodic Markov Decision Processes (MDPs) with S states, A actions, and episode length H. We propose a model-free algorithm UCB-ADVANTAGE and prove that it achieves \tilde{O}(\sqrt{H^2 SAT}) regret where T=KH and K is the number of episodes to play. Our regret bound improves upon the results of [Jin et al., 2018] and matches the best known model-based algorithms as well as the information theoretic lower bound up to logarithmic factors. We also show that UCB-ADVANTAGE achieves low local switching cost and applies to concurrent reinforcement learning, improving upon the recent results of [Bai et al., 2019].
Author Information
Zihan Zhang (Tsinghua University)
Yuan Zhou (UIUC)
Xiangyang Ji (Tsinghua University)
More from the Same Authors
-
2021 : Imitation Learning from Observations under Transition Model Disparity »
Tanmay Gangwani · Yuan Zhou · Jian Peng -
2022 Poster: Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning »
Zihan Zhang · Yuhang Jiang · Yuan Zhou · Xiangyang Ji -
2022 Poster: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 Poster: SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning »
Yuhang Jiang · Jianzhun Shao · Shuncheng He · Hongchang Zhang · Xiangyang Ji -
2022 : An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Rong Jin · Xiangyang Ji · Antoni Chan -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 Spotlight: Lightning Talks 2A-3 »
David Buterez · Chengan He · Xuan Kan · Yutong Lin · Konstantin Schürholt · Yu Yang · Louis Annabi · Wei Dai · Xiaotian Cheng · Alexandre Pitti · Ze Liu · Jon Paul Janet · Jun Saito · Boris Knyazev · Mathias Quoy · Zheng Zhang · James Zachary · Steven J Kiddle · Xavier Giro-i-Nieto · Chang Liu · Hejie Cui · Zilong Zhang · Hakan Bilen · Damian Borth · Dino Oglic · Holly Rushmeier · Han Hu · Xiangyang Ji · Yi Zhou · Nanning Zheng · Ying Guo · Pietro Liò · Stephen Lin · Carl Yang · Yue Cao -
2022 Spotlight: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Spotlight: Lightning Talks 1B-4 »
Andrei Atanov · Shiqi Yang · Wanshan Li · Yongchang Hao · Ziquan Liu · Jiaxin Shi · Anton Plaksin · Jiaxiang Chen · Ziqi Pan · yaxing wang · Yuxin Liu · Stepan Martyanov · Alessandro Rinaldo · Yuhao Zhou · Li Niu · Qingyuan Yang · Andrei Filatov · Yi Xu · Liqing Zhang · Lili Mou · Ruomin Huang · Teresa Yeo · kai wang · Daren Wang · Jessica Hwang · Yuanhong Xu · Qi Qian · Hu Ding · Michalis Titsias · Shangling Jui · Ajay Sohmshetty · Lester Mackey · Joost van de Weijer · Hao Li · Amir Zamir · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Spotlight: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Poster: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Poster: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2021 Poster: Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP »
Zihan Zhang · Jiaqi Yang · Xiangyang Ji · Simon Du -
2021 Poster: TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification »
Zhuchen Shao · Hao Bian · Yang Chen · Yifeng Wang · Jian Zhang · Xiangyang Ji · yongbing zhang -
2020 Poster: Learning Guidance Rewards with Trajectory-space Smoothing »
Tanmay Gangwani · Yuan Zhou · Jian Peng -
2019 Poster: Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function »
Zihan Zhang · Xiangyang Ji -
2019 Poster: Thresholding Bandit with Optimal Aggregate Regret »
Chao Tao · Saúl Blanco · Jian Peng · Yuan Zhou -
2019 Poster: Exploration via Hindsight Goal Generation »
Zhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng