Timezone: »

Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman

We propose CAP, a model-based safe RL framework that accounts for potential modeling errors by capturing model uncertainty and adaptively exploiting it to balance the reward and the cost objectives. First, CAP inflates predicted costs using an uncertainty-based penalty. Theoretically, we show that policies that satisfy this conservative cost constraint are guaranteed to also be feasible in the true environment. We further show that this guarantees the safety of all intermediate solutions during RL training. Further, CAP adaptively tunes this penalty during training using true cost feedback from the environment. We evaluate this conservative and adaptive penalty-based approach for model-based safe RL extensively on state and image-based environments. Our results demonstrate substantial gains in sample-efficiency while incurring fewer violations than prior safe RL algorithms.

Author Information

Jason Yecheng Ma (University of Pennsylvania)
Andrew Shen (University of Melbourne)
Osbert Bastani (University of Pennsylvania)
Dinesh Jayaraman (University of Pennsylvania)

I am an assistant professor at UPenn’s GRASP lab. I lead the Perception, Action, and Learning (PAL) Research Group, where we work on problems at the intersection of computer vision, machine learning, and robotics.

More from the Same Authors