Timezone: »
Model-based Distributional Reinforcement Learning for Risk-sensitive Control
Hao Liang · Zhiquan Luo
Tue Dec 14 09:00 AM -- 10:00 AM (PST) @
We consider finite episodic Markov decision processes aiming at the entropic risk measure of return for risk-sensitive control. We identify several properties of the entropic risk measure that establishes distributional dynamic programming. We propose a novel model-based distributional reinforcement learning (DRL) algorithm, \textbf{R}isk-sensitive \textbf{O}ptimistic \textbf{D}istribution \textbf{I}teration (RODI), that implements optimism through three different subroutines. We prove that all of them attain $\tilde{O}(\frac{\exp(|\beta| H)-1}{|\beta|}\exp(|\beta| H^2)H\sqrt{S^2AK})$ regret upper bound, where $S$ is the number of states, $A$ the number of states, $H$ the time horizon and $K$ the number of episodes. It matches RSVI in the previous work and its regret analysis are conceptually simple and can be easily extended to general risk measures satisfying several key properties. To the best of our knowledge, this is the first regret analysis of DRL, which theoretically verifies the efficacy of DRL for risk-sensitive control. We find that the proof of lower bound in existing work contains mistakes and the corrected proof only implies an $\Omega(\frac{\exp(|\beta| H/2)-1}{|\beta|}\sqrt{K})$ regret, which is irrelevant to $S, A$ and loose in the polynomial dependency on $H$. We improve the result by proving a tighter lower bound of $\Omega(\frac{\exp(\beta H/6)-1}{\beta H}H\sqrt{SAT})$ for $\beta>0$ case.
Author Information
Hao Liang (The Chinese University of Hong Kong, Shenzhen)
Zhiquan Luo (The Chinese University of Hong Kong, Shenzhen and Shenzhen Research Institute of Big Data)
More from the Same Authors
-
2022 : Smoothed-SGDmax: A Stability-Inspired Algorithm to Improve Adversarial Generalization »
Jiancong Xiao · Jiawei Zhang · Zhiquan Luo · Asuman Ozdaglar -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2020 Poster: A Single-Loop Smoothed Gradient Descent-Ascent Algorithm for Nonconvex-Concave Min-Max Problems »
Jiawei Zhang · Peijun Xiao · Ruoyu Sun · Zhiquan Luo