Timezone: »
Sample efficiency and risk-awareness are central to the development of practical reinforcement learning (RL) for complex decision-making. The former can be addressed by transfer learning, while the latter by optimizing some utility function of the return. However, the problem of transferring skills in a risk-aware manner is not well-understood. In this paper, we address the problem of transferring policies between tasks in a common domain that differ only in their reward functions, in which risk is measured by the variance of reward streams. Our approach begins by extending the idea of generalized policy improvement to maximize entropic utilities, thus extending the dynamic programming's policy improvement operation to sets of policies \emph{and} levels of risk-aversion. Next, we extend the idea of successor features (SF), a value function representation that decouples the environment dynamics from the rewards, to capture the variance of returns. Our resulting risk-aware successor features (RaSF) integrate seamlessly within the RL framework, inherit the superior task generalization ability of SFs, while incorporating risk into the decision-making. Experiments on a discrete navigation domain and control of a simulated robotic arm demonstrate the ability of RaSFs to outperform alternative methods including SFs, when taking the risk of the learned policies into account.
Author Information
Michael Gimelfarb (University of Toronto)
Andre Barreto (DeepMind)
Scott Sanner (University of Toronto)
Chi-Guhn Lee (University of Toronto)
More from the Same Authors
-
2021 Spotlight: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2023 Poster: Deep Reinforcement Learning with Plasticity Injection »
Evgenii Nikishin · Junhyuk Oh · Georg Ostrovski · Clare Lyle · Razvan Pascanu · Will Dabney · Andre Barreto -
2023 Poster: A Definition of Continual Reinforcement Learning »
David Abel · Andre Barreto · Benjamin Van Roy · Doina Precup · Hado van Hasselt · Satinder Singh -
2022 Poster: Approximate Value Equivalence »
Christopher Grimm · Andre Barreto · Satinder Singh -
2022 Poster: The Phenomenon of Policy Churn »
Tom Schaul · Andre Barreto · John Quan · Georg Ostrovski -
2021 Poster: Proper Value Equivalence »
Christopher Grimm · Andre Barreto · Greg Farquhar · David Silver · Satinder Singh -
2021 Poster: Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models »
Yi Sui · Ga Wu · Scott Sanner -
2020 Poster: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2020 Poster: The Value Equivalence Principle for Model-Based Reinforcement Learning »
Christopher Grimm · Andre Barreto · Satinder Singh · David Silver -
2020 Spotlight: On Efficiency in Hierarchical Reinforcement Learning »
Zheng Wen · Doina Precup · Morteza Ibrahimi · Andre Barreto · Benjamin Van Roy · Satinder Singh -
2019 Poster: Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates »
Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu -
2019 Demonstration: The Option Keyboard: Combining Skills in Reinforcement Learning »
Daniel Toyama · Shaobo Hou · Gheorghe Comanici · Andre Barreto · Doina Precup · Shibl Mourad · Eser Aygün · Philippe Hamel -
2019 Poster: The Option Keyboard: Combining Skills in Reinforcement Learning »
Andre Barreto · Diana Borsa · Shaobo Hou · Gheorghe Comanici · Eser Aygün · Philippe Hamel · Daniel Toyama · jonathan j hunt · Shibl Mourad · David Silver · Doina Precup -
2018 Poster: Reinforcement Learning with Multiple Experts: A Bayesian Model Combination Approach »
Michael Gimelfarb · Scott Sanner · Chi-Guhn Lee -
2018 Poster: Fast deep reinforcement learning using online adjustments from the past »
Steven Hansen · Alexander Pritzel · Pablo Sprechmann · Andre Barreto · Charles Blundell -
2017 Poster: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2017 Poster: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Successor Features for Transfer in Reinforcement Learning »
Andre Barreto · Will Dabney · Remi Munos · Jonathan Hunt · Tom Schaul · David Silver · Hado van Hasselt -
2017 Spotlight: Natural Value Approximators: Learning when to Trust Past Estimates »
Zhongwen Xu · Joseph Modayil · Hado van Hasselt · Andre Barreto · David Silver · Tom Schaul -
2012 Poster: On-line Reinforcement Learning Using Incremental Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau -
2011 Poster: Reinforcement Learning using Kernel-Based Stochastic Factorization »
Andre S Barreto · Doina Precup · Joelle Pineau