`

Timezone: »

 
MESA: Offline Meta-RL for Safe Adaptation and Fault Tolerance
Michael Luo · Ashwin Balakrishna · Brijen Thananjeyan · Suraj Nair · Julian Ibarz · Jie Tan · Chelsea Finn · Ion Stoica · Ken Goldberg

Safe exploration is critical for using reinforcement learning in risk-sensitive environments. Recent work learns risk measures which measure the probability of violating constraints, which can then be used to enable safety. However, learning such risk measures requires significant interaction with the environment, resulting in excessive constraint violations during learning. Furthermore, these measures are not easily transferable to new environments in which the agent may be deployed. We cast safe exploration as an offline meta-reinforcement learning problem, where the objective is to leverage examples of safe and unsafe behavior across a range of environments to quickly adapt learned risk measures to a new environment with previously unseen dynamics. We then propose MEta-learning for Safe Adaptation (MESA), an approach for meta-learning a risk measure for safe reinforcement learning. Simulation experiments across 3 continuous control domains suggest that MESA can leverage offline data from a range of different environments to reduce constraint violations in unseen environments by up to a factor of 2 while maintaining task performance.

Author Information

Michael Luo (University of California Berkeley)
Ashwin Balakrishna (UC Berkeley)

I am a second year PhD student in Robotics and Artificial Intelligence at UC Berkeley and am advised by Professor Ken Goldberg of the UC Berkeley AUTOLAB. My research interests are in developing algorithms for imitation and reinforcement learning that are reliable and robust enough to safely deploy on robotic systems. I am currently interested in hybrid algorithms between imitation and reinforcement learning to leverage demonstrations to either guide exploration in RL or perform reward inference. I received my Bachelor’s Degree in Electrical Engineering at Caltech in 2018, and enjoy watching/playing tennis, hiking, and eating interesting foods.

Brijen Thananjeyan (UC Berkeley)
Suraj Nair (Stanford University)
Julian Ibarz (Google Inc.)
Jie Tan (Google)
Chelsea Finn (Stanford)
Ion Stoica (UC Berkeley)
Ken Goldberg (UC Berkeley)

More from the Same Authors