Timezone: »
Poster
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function
Zihan Zhang · Xiangyang Ji
Tue Dec 10 05:30 PM -- 07:30 PM (PST) @ East Exhibition Hall B + C #211
We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently.
By evaluating the state-pair difference of the optimal bias function $h^{*}$, the proposed algorithm achieves a regret bound of $\tilde{O}(\sqrt{SAHT})$\footnote{The symbol $\tilde{O}$ means $O$ with log factors ignored. } for MDP with $S$ states and $A$ actions, in the case that an upper bound $H$ on the span of $h^{*}$, i.e., $sp(h^{*})$ is known.
This result outperforms the best previous regret bounds $\tilde{O}(S\sqrt{AHT}) $\citep{fruit2019improved} by a factor of $\sqrt{S}$.
Furthermore, this regret bound matches the lower bound of $\Omega(\sqrt{SAHT}) $\citep{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of $\tilde{O}(\sqrt{SADT})$ for MDPs with a finite diameter $D$ compared to the lower bound of $\Omega(\sqrt{SADT}) $\citep{jaksch2010near}.
Author Information
Zihan Zhang (Tsinghua University)
Xiangyang Ji (Tsinghua University)
More from the Same Authors
-
2022 Poster: Near-Optimal Regret Bounds for Multi-batch Reinforcement Learning »
Zihan Zhang · Yuhang Jiang · Yuan Zhou · Xiangyang Ji -
2022 Poster: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 Poster: SPD: Synergy Pattern Diversifying Oriented Unsupervised Multi-agent Reinforcement Learning »
Yuhang Jiang · Jianzhun Shao · Shuncheng He · Hongchang Zhang · Xiangyang Ji -
2022 : An Empirical Study on Distribution Shift Robustness From the Perspective of Pre-Training and Data Augmentation »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Rong Jin · Xiangyang Ji · Antoni Chan -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Self-Organized Group for Cooperative Multi-agent Reinforcement Learning »
Jianzhun Shao · Zhiqiang Lou · Hongchang Zhang · Yuhang Jiang · Shuncheng He · Xiangyang Ji -
2022 Spotlight: Lightning Talks 2A-3 »
David Buterez · Chengan He · Xuan Kan · Yutong Lin · Konstantin Schürholt · Yu Yang · Louis Annabi · Wei Dai · Xiaotian Cheng · Alexandre Pitti · Ze Liu · Jon Paul Janet · Jun Saito · Boris Knyazev · Mathias Quoy · Zheng Zhang · James Zachary · Steven J Kiddle · Xavier Giro-i-Nieto · Chang Liu · Hejie Cui · Zilong Zhang · Hakan Bilen · Damian Borth · Dino Oglic · Holly Rushmeier · Han Hu · Xiangyang Ji · Yi Zhou · Nanning Zheng · Ying Guo · Pietro Liò · Stephen Lin · Carl Yang · Yue Cao -
2022 Spotlight: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Spotlight: Lightning Talks 1B-4 »
Andrei Atanov · Shiqi Yang · Wanshan Li · Yongchang Hao · Ziquan Liu · Jiaxin Shi · Anton Plaksin · Jiaxiang Chen · Ziqi Pan · yaxing wang · Yuxin Liu · Stepan Martyanov · Alessandro Rinaldo · Yuhao Zhou · Li Niu · Qingyuan Yang · Andrei Filatov · Yi Xu · Liqing Zhang · Lili Mou · Ruomin Huang · Teresa Yeo · kai wang · Daren Wang · Jessica Hwang · Yuanhong Xu · Qi Qian · Hu Ding · Michalis Titsias · Shangling Jui · Ajay Sohmshetty · Lester Mackey · Joost van de Weijer · Hao Li · Amir Zamir · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Spotlight: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2022 Poster: Distilling Representations from GAN Generator via Squeeze and Span »
Yu Yang · Xiaotian Cheng · Chang Liu · Hakan Bilen · Xiangyang Ji -
2022 Poster: Improved Fine-Tuning by Better Leveraging Pre-Training Data »
Ziquan Liu · Yi Xu · Yuanhong Xu · Qi Qian · Hao Li · Xiangyang Ji · Antoni Chan · Rong Jin -
2021 Poster: Improved Variance-Aware Confidence Sets for Linear Bandits and Linear Mixture MDP »
Zihan Zhang · Jiaqi Yang · Xiangyang Ji · Simon Du -
2021 Poster: TransMIL: Transformer based Correlated Multiple Instance Learning for Whole Slide Image Classification »
Zhuchen Shao · Hao Bian · Yang Chen · Yifeng Wang · Jian Zhang · Xiangyang Ji · yongbing zhang -
2020 Poster: Almost Optimal Model-Free Reinforcement Learningvia Reference-Advantage Decomposition »
Zihan Zhang · Yuan Zhou · Xiangyang Ji