Timezone: »
Modelbased Distributional Reinforcement Learning for Risksensitive Control
Hao Liang · Zhiquan Luo
Tue Dec 14 09:00 AM  10:00 AM (PST) @
We consider finite episodic Markov decision processes aiming at the entropic risk measure of return for risksensitive control. We identify several properties of the entropic risk measure that establishes distributional dynamic programming. We propose a novel modelbased distributional reinforcement learning (DRL) algorithm, \textbf{R}isksensitive \textbf{O}ptimistic \textbf{D}istribution \textbf{I}teration (RODI), that implements optimism through three different subroutines. We prove that all of them attain $\tilde{O}(\frac{\exp(\beta H)1}{\beta}\exp(\beta H^2)H\sqrt{S^2AK})$ regret upper bound, where $S$ is the number of states, $A$ the number of states, $H$ the time horizon and $K$ the number of episodes. It matches RSVI in the previous work and its regret analysis are conceptually simple and can be easily extended to general risk measures satisfying several key properties. To the best of our knowledge, this is the first regret analysis of DRL, which theoretically verifies the efficacy of DRL for risksensitive control. We find that the proof of lower bound in existing work contains mistakes and the corrected proof only implies an $\Omega(\frac{\exp(\beta H/2)1}{\beta}\sqrt{K})$ regret, which is irrelevant to $S, A$ and loose in the polynomial dependency on $H$. We improve the result by proving a tighter lower bound of $\Omega(\frac{\exp(\beta H/6)1}{\beta H}H\sqrt{SAT})$ for $\beta>0$ case.
Author Information
Hao Liang (The Chinese University of Hong Kong, Shenzhen)
Zhiquan Luo (The Chinese University of Hong Kong, Shenzhen and Shenzhen Research Institute of Big Data)
More from the Same Authors

2022 : SmoothedSGDmax: A StabilityInspired Algorithm to Improve Adversarial Generalization »
Jiancong Xiao · Jiawei Zhang · Zhiquan Luo · Asuman Ozdaglar 
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo 
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo 
2020 Poster: A SingleLoop Smoothed Gradient DescentAscent Algorithm for NonconvexConcave MinMax Problems »
Jiawei Zhang · Peijun Xiao · Ruoyu Sun · Zhiquan Luo