Timezone: »
A key challenge for reinforcement learning is solving long-horizon planning problems. Recent work has leveraged programs to guide reinforcement learning in these settings. However, these approaches impose a high manual burden on the user since they must provide a guiding program for every new task. Partially observed environments further complicate the programming task because the program must implement a strategy that correctly, and ideally optimally, handles every possible configuration of the hidden regions of the environment. We propose a new approach, model predictive program synthesis (MPPS), that uses program synthesis to automatically generate the guiding programs. It trains a generative model to predict the unobserved portions of the world, and then synthesizes a program based on samples from this model in a way that is robust to its uncertainty. In our experiments, we show that our approach significantly outperforms non-program-guided approaches on a set of challenging benchmarks, including a 2D Minecraft-inspired environment where the agent must complete a complex sequence of subtasks to achieve its goal, and achieves a similar performance as using handcrafted programs to guide the agent. Our results demonstrate that our approach can obtain the benefits of program-guided reinforcement learning without requiring the user to provide a new guiding program for every new task.
Author Information
Yichen Yang (MIT)
Jeevana Priya Inala (Microsoft Research)
Osbert Bastani (University of Pennsylvania)
Yewen Pu (Autodesk)
Armando Solar-Lezama (MIT)
Martin Rinard (MIT)
Related Events (a corresponding poster, oral, or spotlight)
-
2021 Spotlight: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Dates n/a. Room
More from the Same Authors
-
2020 : Paper 50: Diverse Sampling for Flow-Based Trajectory Forecasting »
Jason Yecheng Ma · Jeevana Priya Inala · Dinesh Jayaraman · Osbert Bastani -
2021 : AutumnSynth: Synthesis of Reactive Programs with Structured Latent State »
Ria Das · Zenna Tavares · Josh Tenenbaum · Armando Solar-Lezama -
2021 : PAC Synthesis of Machine Learning Programs »
Osbert Bastani -
2021 : Synthesizing Video Trajectory Queries »
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2021 : Synthesis of Reactive Programs with Structured Latent State »
Ria Das · Zenna Tavares · Armando Solar-Lezama · Josh Tenenbaum -
2022 : Neurosymbolic Programming for Science »
Jennifer J Sun · Megan Tjandrasuwita · Atharva Sehgal · Armando Solar-Lezama · Swarat Chaudhuri · Yisong Yue · Omar Costilla Reyes -
2022 : Lemma: Bootstrapping High-Level Mathematical Reasoning with Learned Symbolic Abstractions »
Zhening Li · Gabriel Poesia Reis e Silva · Omar Costilla Reyes · Noah Goodman · Armando Solar-Lezama -
2022 Spotlight: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 : Panel »
Jeevana Priya Inala · Pushmeet Kohli · Ann Kennedy · Sriram Rajamani · Yisong Yue -
2022 : Q & A »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 Tutorial: Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 : Neurosymbolic Programming »
Swarat Chaudhuri · Jennifer J Sun · Armando Solar-Lezama -
2022 : Human Evaluation of Text-to-Image Models on a Multi-Task Benchmark »
Vitali Petsiuk · Alexander E. Siemenn · Saisamrit Surbehera · Qi Qi Chin · Keith Tyser · Gregory Hunter · Arvind Raghavan · Yann Hicke · Bryan Plummer · Ori Kerret · Tonio Buonassisi · Kate Saenko · Armando Solar-Lezama · Iddo Drori -
2022 Poster: Fault-Aware Neural Code Rankers »
Jeevana Priya Inala · Chenglong Wang · Mei Yang · Andres Codas · Mark Encarnación · Shuvendu Lahiri · Madanlal Musuvathi · Jianfeng Gao -
2022 Poster: Communicating Natural Programs to Humans and Machines »
Sam Acquaviva · Yewen Pu · Marta Kryven · Theodoros Sechopoulos · Catherine Wong · Gabrielle Ecanow · Maxwell Nye · Michael Tessler · Josh Tenenbaum -
2021 : Efficient Pragmatic Program Synthesis with Informative Specifications »
Saujas Vaduguru · Yewen Pu · Kevin Ellis -
2021 Poster: Conservative Offline Distributional Reinforcement Learning »
Jason Yecheng Ma · Dinesh Jayaraman · Osbert Bastani -
2021 Poster: Towards Context-Agnostic Learning Using Synthetic Data »
Charles Jin · Martin Rinard -
2021 Poster: Compositional Reinforcement Learning from Logical Specifications »
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur -
2021 Poster: Learning Models for Actionable Recourse »
Alexis Ross · Himabindu Lakkaraju · Osbert Bastani -
2020 : Invited Talk (Armando Solar-Lezama) »
Armando Solar-Lezama -
2020 Workshop: Workshop on Computer Assisted Programming (CAP) »
Augustus Odena · Charles Sutton · Nadia Polikarpova · Josh Tenenbaum · Armando Solar-Lezama · Isil Dillig -
2020 Poster: Program Synthesis with Pragmatic Communication »
Yewen Pu · Kevin Ellis · Marta Kryven · Josh Tenenbaum · Armando Solar-Lezama -
2020 Poster: Learning Compositional Rules via Neural Program Synthesis »
Maxwell Nye · Armando Solar-Lezama · Josh Tenenbaum · Brenden Lake -
2020 Poster: Neurosymbolic Transformers for Multi-Agent Communication »
Jeevana Priya Inala · Yichen Yang · James Paulos · Yewen Pu · Osbert Bastani · Vijay Kumar · Martin Rinard · Armando Solar-Lezama -
2020 Poster: Efficient Exact Verification of Binarized Neural Networks »
Kai Jia · Martin Rinard -
2019 : Break / Poster Session 1 »
Antonia Marcu · Yao-Yuan Yang · Pascale Gourdeau · Chen Zhu · Thodoris Lykouris · Jianfeng Chi · Mark Kozdoba · Arjun Nitin Bhagoji · Xiaoxia Wu · Jay Nandy · Michael T Smith · Bingyang Wen · Yuege Xie · Konstantinos Pitas · Suprosanna Shit · Maksym Andriushchenko · Dingli Yu · Gaël Letarte · Misha Khodak · Hussein Mozannar · Chara Podimata · James Foulds · Yizhen Wang · Huishuai Zhang · Ondrej Kuzelka · Alexander Levine · Nan Lu · Zakaria Mhammedi · Paul Viallard · Diana Cai · Lovedeep Gondara · James Lucas · Yasaman Mahdaviyeh · Aristide Baratin · Rishi Bommasani · Alessandro Barp · Andrew Ilyas · Kaiwen Wu · Jens Behrmann · Omar Rivasplata · Amir Nazemi · Aditi Raghunathan · Will Stephenson · Sahil Singla · Akhil Gupta · YooJung Choi · Yannic Kilcher · Clare Lyle · Edoardo Manino · Andrew Bennett · Zhi Xu · Niladri Chatterji · Emre Barut · Flavien Prost · Rodrigo Toro Icarte · Arno Blaas · Chulhee Yun · Sahin Lale · YiDing Jiang · Tharun Kumar Reddy Medini · Ashkan Rezaei · Alexander Meinke · Stephen Mell · Gary Kazantsev · Shivam Garg · Aradhana Sinha · Vishnu Lokhande · Geovani Rizk · Han Zhao · Aditya Kumar Akash · Jikai Hou · Ali Ghodsi · Matthias Hein · Tyler Sypherd · Yichen Yang · Anastasia Pentina · Pierre Gillot · Antoine Ledent · Guy Gur-Ari · Noah MacAulay · Tianzong Zhang -
2019 Poster: Write, Execute, Assess: Program Synthesis with a REPL »
Kevin Ellis · Maxwell Nye · Yewen Pu · Felix Sosa · Josh Tenenbaum · Armando Solar-Lezama -
2018 Poster: Learning to Infer Graphics Programs from Hand-Drawn Images »
Kevin Ellis · Daniel Ritchie · Armando Solar-Lezama · Josh Tenenbaum -
2018 Poster: Learning Libraries of Subroutines for Neurally–Guided Bayesian Program Induction »
Kevin Ellis · Lucas Morales · Mathias Sablé-Meyer · Armando Solar-Lezama · Josh Tenenbaum -
2018 Spotlight: Learning to Infer Graphics Programs from Hand-Drawn Images »
Kevin Ellis · Daniel Ritchie · Armando Solar-Lezama · Josh Tenenbaum -
2018 Spotlight: Learning Libraries of Subroutines for Neurally–Guided Bayesian Program Induction »
Kevin Ellis · Lucas Morales · Mathias Sablé-Meyer · Armando Solar-Lezama · Josh Tenenbaum -
2018 Poster: Verifiable Reinforcement Learning via Policy Extraction »
Osbert Bastani · Yewen Pu · Armando Solar-Lezama -
2018 Poster: Interpreting Neural Network Judgments via Minimal, Stable, and Symbolic Corrections »
Xin Zhang · Armando Solar-Lezama · Rishabh Singh -
2016 Poster: Sampling for Bayesian Program Learning »
Kevin Ellis · Armando Solar-Lezama · Josh Tenenbaum -
2015 Poster: Unsupervised Learning by Program Synthesis »
Kevin Ellis · Armando Solar-Lezama · Josh Tenenbaum