Timezone: »
We study risk-sensitive reinforcement learning (RL) based on the entropic risk measure. Although existing works have established non-asymptotic regret guarantees for this problem, they leave open an exponential gap between the upper and lower bounds. We identify the deficiencies in existing algorithms and their analysis that result in such a gap. To remedy these deficiencies, we investigate a simple transformation of the risk-sensitive Bellman equations, which we call the exponential Bellman equation. The exponential Bellman equation inspires us to develop a novel analysis of Bellman backup procedures in risk-sensitive RL algorithms, and further motivates the design of a novel exploration mechanism. We show that these analytic and algorithmic innovations together lead to improved regret upper bounds over existing ones.
Author Information
Yingjie Fei (Cornell University)
Zhuoran Yang (Princeton)
Yudong Chen (Cornell University)
Zhaoran Wang (Princeton University)
More from the Same Authors
-
2021 : GPU-Podracer: Scalable and Elastic Library for Cloud-Native Deep Reinforcement Learning »
Xiao-Yang Liu · Zhuoran Yang · Zhaoran Wang · Anwar Walid · Jian Guo · Michael Jordan -
2021 : Exponential Family Model-Based Reinforcement Learning via Score Matching »
Gene Li · Junbo Li · Nathan Srebro · Zhaoran Wang · Zhuoran Yang -
2021 Poster: Pessimism Meets Invariance: Provably Efficient Offline Mean-Field Multi-Agent RL »
Minshuo Chen · Yan Li · Ethan Wang · Zhuoran Yang · Zhaoran Wang · Tuo Zhao -
2021 Poster: A Near-Optimal Algorithm for Stochastic Bilevel Optimization via Double-Momentum »
Prashant Khanduri · Siliang Zeng · Mingyi Hong · Hoi-To Wai · Zhaoran Wang · Zhuoran Yang -
2021 Poster: BooVI: Provably Efficient Bootstrapped Value Iteration »
Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Wasserstein Flow Meets Replicator Dynamics: A Mean-Field Analysis of Representation Learning in Actor-Critic »
Yufeng Zhang · Siyu Chen · Zhuoran Yang · Michael Jordan · Zhaoran Wang -
2021 Poster: Offline Constrained Multi-Objective Reinforcement Learning via Pessimistic Dual Value Iteration »
Runzhe Wu · Yufeng Zhang · Zhuoran Yang · Zhaoran Wang -
2021 Poster: Rank Overspecified Robust Matrix Recovery: Subgradient Method and Exact Recovery »
Lijun Ding · Liwei Jiang · Yudong Chen · Qing Qu · Zhihui Zhu -
2021 Poster: Dynamic Bottleneck for Robust Self-Supervised Exploration »
Chenjia Bai · Lingxiao Wang · Lei Han · Animesh Garg · Jianye Hao · Peng Liu · Zhaoran Wang -
2021 Poster: Provably Efficient Causal Reinforcement Learning with Confounded Observational Data »
Lingxiao Wang · Zhuoran Yang · Zhaoran Wang -
2020 Poster: Pontryagin Differentiable Programming: An End-to-End Learning and Control Framework »
Wanxin Jin · Zhaoran Wang · Zhuoran Yang · Shaoshuai Mou -
2020 Poster: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Oral: Can Temporal-Difference and Q-Learning Learn Representation? A Mean-Field Theory »
Yufeng Zhang · Qi Cai · Zhuoran Yang · Yongxin Chen · Zhaoran Wang -
2020 Poster: Provably Efficient Neural GTD for Off-Policy Learning »
Hoi-To Wai · Zhuoran Yang · Zhaoran Wang · Mingyi Hong -
2020 Poster: Provably Efficient Neural Estimation of Structural Equation Models: An Adversarial Approach »
Luofeng Liao · You-Lin Chen · Zhuoran Yang · Bo Dai · Mladen Kolar · Zhaoran Wang -
2020 Poster: Dynamic Regret of Policy Optimization in Non-Stationary Environments »
Yingjie Fei · Zhuoran Yang · Zhaoran Wang · Qiaomin Xie -
2020 Poster: On Function Approximation in Reinforcement Learning: Optimism in the Face of Large State Spaces »
Zhuoran Yang · Chi Jin · Zhaoran Wang · Mengdi Wang · Michael Jordan -
2020 Poster: Upper Confidence Primal-Dual Reinforcement Learning for CMDP with Adversarial Loss »
Shuang Qiu · Xiaohan Wei · Zhuoran Yang · Jieping Ye · Zhaoran Wang -
2020 Poster: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2020 Spotlight: Risk-Sensitive Reinforcement Learning: Near-Optimal Risk-Sample Tradeoff in Regret »
Yingjie Fei · Zhuoran Yang · Yudong Chen · Zhaoran Wang · Qiaomin Xie -
2019 Poster: Factor Group-Sparse Regularization for Efficient Low-Rank Matrix Recovery »
Jicong Fan · Lijun Ding · Yudong Chen · Madeleine Udell -
2019 Poster: Global Convergence of Least Squares EM for Demixing Two Log-Concave Densities »
Wei Qian · Yuqian Zhang · Yudong Chen -
2016 Poster: NESTT: A Nonconvex Primal-Dual Splitting Method for Distributed and Stochastic Optimization »
Davood Hajinezhad · Mingyi Hong · Tuo Zhao · Zhaoran Wang -
2016 Poster: Agnostic Estimation for Misspecified Phase Retrieval Models »
Matey Neykov · Zhaoran Wang · Han Liu -
2016 Poster: Online ICA: Understanding Global Dynamics of Nonconvex Optimization via Diffusion Processes »
Chris Junchi Li · Zhaoran Wang · Han Liu -
2016 Poster: Blind Attacks on Machine Learners »
Alex Beatson · Zhaoran Wang · Han Liu -
2016 Poster: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning »
Xinyang Yi · Zhaoran Wang · Zhuoran Yang · Constantine Caramanis · Han Liu -
2015 Poster: Optimal Linear Estimation under Unknown Nonlinear Transform »
Xinyang Yi · Zhaoran Wang · Constantine Caramanis · Han Liu -
2015 Poster: Non-convex Statistical Optimization for Sparse Tensor Graphical Model »
Wei Sun · Zhaoran Wang · Han Liu · Guang Cheng -
2015 Poster: High Dimensional EM Algorithm: Statistical Optimization and Asymptotic Normality »
Zhaoran Wang · Quanquan Gu · Yang Ning · Han Liu -
2015 Poster: A Nonconvex Optimization Framework for Low Rank Matrix Estimation »
Tuo Zhao · Zhaoran Wang · Han Liu -
2014 Poster: Sparse PCA with Oracle Property »
Quanquan Gu · Zhaoran Wang · Han Liu -
2014 Poster: Tighten after Relax: Minimax-Optimal Sparse PCA in Polynomial Time »
Zhaoran Wang · Huanran Lu · Han Liu