Timezone: »
We study the problem of learning control policies for complex tasks given by logical specifications. Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy that maximizes the expected reward. These approaches, however, scale poorly to complex tasks that require high-level planning. In this work, we develop a compositional learning approach, called DIRL, that interleaves high-level planning and reinforcement learning. First, DIRL encodes the specification as an abstract graph; intuitively, vertices and edges of the graph correspond to regions of the state space and simpler sub-tasks, respectively. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph. An evaluation of the proposed approach on a set of challenging control benchmarks with continuous state and action spaces demonstrates that it outperforms state-of-the-art baselines.
Author Information
Kishor Jothimurugan (University of Pennsylvania)
Suguman Bansal (University of Pennsylvania)
Suguman Bansal is a postdoctoral researcher in the Department of Computer and Information Sciences at the University of Pennsylvania. Her research interests lie at the intersection of Artificial Intelligence and Programming Languages. Specifically, she works on developing tools and techniques to improve the quality of automated verification and synthesis of computational systems. Her recent work concerns providing formal guarantees about learning-enabled systems with a focus on Reinforcement Learning. She received her Ph.D. (2020) and M.S. (2016) in Computer Science from Rice University, and B.S. (with Honors) degree (2014) in Mathematics and Computer Science from Chennai Mathematical Institute. She is the recipient of the NSF/CRA Computing Innovation Fellow 2020, EECS Rising Stars 2018, and Andrew Ladd Fellowship 2016, among others.
Osbert Bastani (University of Pennsylvania)
Rajeev Alur (University of Pennsylvania)
More from the Same Authors
-
2021 Spotlight: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 : Specification-Guided Learning of Nash Equilibria with High Social Welfare »
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur -
2021 : PAC Synthesis of Machine Learning Programs »
Osbert Bastani -
2021 : Synthesizing Video Trajectory Queries »
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2022 : Robust Option Learning for Adversarial Generalization »
Kishor Jothimurugan · Steve Hsu · Osbert Bastani · Rajeev Alur -
2023 Poster: Stability Guarantees for Feature Attributions with Multiplicative Smoothing »
Anton Xue · Rajeev Alur · Eric Wong -
2021 Poster: Conservative Offline Distributional Reinforcement Learning »
Jason Yecheng Ma · Dinesh Jayaraman · Osbert Bastani -
2021 Poster: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 Poster: Learning Models for Actionable Recourse »
Alexis Ross · Himabindu Lakkaraju · Osbert Bastani -
2019 Poster: A Composable Specification Language for Reinforcement Learning Tasks »
Kishor Jothimurugan · Rajeev Alur · Osbert Bastani -
2018 Poster: Verifiable Reinforcement Learning via Policy Extraction »
Osbert Bastani · Yewen Pu · Armando Solar-Lezama