Timezone: »

Regret Bounds for Risk-Sensitive Reinforcement Learning
Osbert Bastani · Jason Yecheng Ma · Estelle Shen · Wanqiao Xu

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #720

In safety-critical applications of reinforcement learning such as healthcare and robotics, it is often desirable to optimize risk-sensitive objectives that account for tail outcomes rather than expected reward. We prove the first regret bounds for reinforcement learning under a general class of risk-sensitive objectives including the popular CVaR objective. Our theory is based on a novel characterization of the CVaR objective as well as a novel optimistic MDP construction.

Author Information

Osbert Bastani (University of Pennsylvania)
Jason Yecheng Ma (University of Pennsylvania)
Estelle Shen (University of Pennsylvania)
Wanqiao Xu (Stanford University)

More from the Same Authors