Timezone: »
Distributional reinforcement learning aims to learn distribution of return under stochastic environments. Since the learned distribution of return contains rich information about the stochasticity of the environment, previous studies have relied on descriptive statistics, such as standard deviation, for optimism in the face of uncertainty. However, using the uncertainty from an empirical distribution can hinder convergence and performance when exploring with the certain criterion that has an one-sided tendency on risk in these methods. In this paper, we propose a novel distributional reinforcement learning that explores by randomizing risk criterion to reach a risk-neutral optimal policy. First, we provide a perturbed distributional Bellman optimality operator by distorting the risk measure in action selection. Second, we prove the convergence and optimality of the proposed method by using the weaker contraction property. Our theoretical results support that the proposed method does not fall into biased exploration and is guaranteed to converge to an optimal return distribution. Finally, we empirically show that our method outperforms other existing distribution-based algorithms in various environments including 55 Atari games.
Author Information
Taehyun Cho (seoul university)
Seungyub Han (Seoul National University)
Heesoo Lee (Seoul National University)
Kyungjae Lee (ChungAng University)
Jungwoo Lee (Seoul National University)
More from the Same Authors
-
2022 : Adaptive Methods for Nonconvex Continual Learning »
Seungyub Han · Yeongmo Kim · Taehyun Cho · Jungwoo Lee -
2023 Poster: Pitfall of Optimism: Distributional Reinforcement Learning by Randomizing Risk Criterion »
Taehyun Cho · Seungyub Han · Heesoo Lee · Kyungjae Lee · Jungwoo Lee -
2023 Poster: Trust Region-Based Safe Distributional Reinforcement Learning for Multiple Constraints »
Dohyeong Kim · Kyungjae Lee · Songhwai Oh -
2023 Poster: SPQR: Controlling Q-ensemble Independence for Reinforcement Learning »
Dohyeok Lee · Seungyub Han · Taehyun Cho · Jungwoo Lee -
2023 Poster: Sequential Preference Ranking for Efficient Reinforcement Learning from Human Feedback »
Minyoung Hwang · Gunmin Lee · Hogun Kee · Chan Woo Kim · Kyungjae Lee · Songhwai Oh -
2023 Poster: Score-based Generative Modeling through Stochastic Evolution Equations »
Sungbin Lim · Eunbi Yoon · Taehyun Byun · Taewon Kang · Seungwoo Kim · Kyungjae Lee · Sungjoon Choi -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: Riemannian Neural SDE: Learning Stochastic Representations on Manifolds »
Sung Woo Park · Hyomin Kim · Kyungjae Lee · Junseok Kwon -
2018 : Poster session »
David Zeng · Marzieh S. Tahaei · Shuai Chen · Felix Meister · Meet Shah · Anant Gupta · Ajil Jalal · Eirini Arvaniti · David Zimmerer · Konstantinos Kamnitsas · Pedro Ballester · Nathaniel Braman · Udaya Kumar · Sil C. van de Leemput · Junaid Qadir · Hoel Kervadec · Mohamed Akrout · Adrian Tousignant · Matthew Ng · Raghav Mehta · Miguel Monteiro · Sumana Basu · Jonas Adler · Adrian Dalca · Jizong Peng · Sungyeob Han · Xiaoxiao Li · Karthik Gopinath · Joseph Cheng · Bogdan Georgescu · Kha Gia Quach · Karthik Sarma · David Van Veen