Timezone: »
We present REVEL, a partially neural reinforcement learning (RL) framework for provably safe exploration in continuous state and action spaces. A key challenge for provably safe deep RL is that repeatedly verifying neural networks within a learning loop is computationally infeasible. We address this challenge using two policy classes: a general, neurosymbolic class with approximate gradients and a more restricted class of symbolic policies that allows efficient verification. Our learning algorithm is a mirror descent over policies: in each iteration, it safely lifts a symbolic policy into the neurosymbolic space, performs safe gradient updates to the resulting policy, and projects the updated policy into the safe symbolic subset, all without requiring explicit verification of neural networks. Our empirical results show that REVEL enforces safe exploration in many scenarios in which Constrained Policy Optimization does not, and that it can discover policies that outperform those learned through prior approaches to verified exploration.
Author Information
Greg Anderson (University of Texas at Austin)
Abhinav Verma (University of Texas at Austin)
Isil Dillig (UT Austin)
Swarat Chaudhuri (The University of Texas at Austin)
More from the Same Authors
-
2021 Spotlight: Neural Program Generation Modulo Static Analysis »
Rohan Mukherjee · Yeming Wen · Dipak Chaudhari · Thomas Reps · Swarat Chaudhuri · Christopher Jermaine -
2021 : Safe Neurosymbolic Learning with Differentiable Symbolic Execution »
Chenxi Yang · Swarat Chaudhuri -
2022 : Neurosymbolic Programming for Science »
Jennifer J Sun · Megan Tjandrasuwita · Atharva Sehgal · Armando Solar-Lezama · Swarat Chaudhuri · Yisong Yue · Omar Costilla Reyes -
2023 Poster: Compositional Policy Learning in Stochastic Control Systems with Formal Guarantees »
Krishnendu Chatterjee · Thomas Henzinger · Mathias Lechner · Abhinav Verma · Đorđe Žikelić -
2023 Poster: Satisfiability-Aided Language Models Using Declarative Prompting »
Xi Ye · Qiaochu Chen · Isil Dillig · Greg Durrett -
2022 : Q & A »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 Tutorial: Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 : Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 Poster: Policy Optimization with Linear Temporal Logic Constraints »
Cameron Voloshin · Hoang Le · Swarat Chaudhuri · Yisong Yue -
2021 Poster: Neural Program Generation Modulo Static Analysis »
Rohan Mukherjee · Yeming Wen · Dipak Chaudhari · Thomas Reps · Swarat Chaudhuri · Christopher Jermaine -
2020 : Swarat Chaudhuri Talk »
Swarat Chaudhuri -
2020 Poster: Learning Differentiable Programs with Admissible Neural Heuristics »
Ameesh Shah · Eric Zhan · Jennifer J Sun · Abhinav Verma · Yisong Yue · Swarat Chaudhuri -
2019 Poster: Imitation-Projected Programmatic Reinforcement Learning »
Abhinav Verma · Hoang Le · Yisong Yue · Swarat Chaudhuri