Skip to yearly menu bar Skip to main content


Poster

Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function

Zihan Zhang · Xiangyang Ji

East Exhibition Hall B, C #211

Keywords: [ Reinforcement Learning ] [ Reinforcement Learning and Planning ] [ Decision and Control; Reinforcement Learning and Planning ] [ Algorithms -> Online Learning; Reinforcement Learning and Planning ]


Abstract: We present an algorithm based on the \emph{Optimism in the Face of Uncertainty} (OFU) principle which is able to learn Reinforcement Learning (RL) modeled by Markov decision process (MDP) with finite state-action space efficiently. By evaluating the state-pair difference of the optimal bias function h, the proposed algorithm achieves a regret bound of O~(SAHT)\footnote{The symbol O~ means O with log factors ignored. } for MDP with S states and A actions, in the case that an upper bound H on the span of h, i.e., sp(h) is known. This result outperforms the best previous regret bounds O~(SAHT)\citep{fruit2019improved} by a factor of S. Furthermore, this regret bound matches the lower bound of Ω(SAHT)\citep{jaksch2010near} up to a logarithmic factor. As a consequence, we show that there is a near optimal regret bound of O~(SADT) for MDPs with a finite diameter D compared to the lower bound of Ω(SADT)\citep{jaksch2010near}.

Live content is unavailable. Log in and register to view live content