Timezone: »
Poster
Offline Goal-Conditioned Reinforcement Learning via $f$-Advantage Regression
Jason Yecheng Ma · Jason Yan · Dinesh Jayaraman · Osbert Bastani
Offline goal-conditioned reinforcement learning (GCRL) promises general-purpose skill learning in the form of reaching diverse goals from purely offline datasets. We propose $\textbf{Go}$al-conditioned $f$-$\textbf{A}$dvantage $\textbf{R}$egression (GoFAR), a novel regression-based offline GCRL algorithm derived from a state-occupancy matching perspective; the key intuition is that the goal-reaching task can be formulated as a state-occupancy matching problem between a dynamics-abiding imitator agent and an expert agent that directly teleports to the goal. In contrast to prior approaches, GoFAR does not require any hindsight relabeling and enjoys uninterleaved optimization for its value and policy networks. These distinct features confer GoFAR with much better offline performance and stability as well as statistical performance guarantee that is unattainable for prior methods. Furthermore, we demonstrate that GoFAR's training objectives can be re-purposed to learn an agent-independent goal-conditioned planner from purely offline source-domain data, which enables zero-shot transfer to new target domains. Through extensive experiments, we validate GoFAR's effectiveness in various problem settings and tasks, significantly outperforming prior state-of-art. Notably, on a real robotic dexterous manipulation task, while no other method makes meaningful progress, GoFAR acquires complex manipulation behavior that successfully accomplishes diverse goals.
Author Information
Jason Yecheng Ma (University of Pennsylvania)
Jason Yan (University of Pennsylvania)
Dinesh Jayaraman (University of Pennsylvania)
I am an assistant professor at UPenn’s GRASP lab. I lead the Perception, Action, and Learning (PAL) Research Group, where we work on problems at the intersection of computer vision, machine learning, and robotics.
Osbert Bastani (University of Pennsylvania)
More from the Same Authors
-
2020 : Paper 50: Diverse Sampling for Flow-Based Trajectory Forecasting »
Jason Yecheng Ma · Jeevana Priya Inala · Dinesh Jayaraman · Osbert Bastani -
2021 Spotlight: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2021 : Specification-Guided Learning of Nash Equilibria with High Social Welfare »
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur -
2021 : PAC Synthesis of Machine Learning Programs »
Osbert Bastani -
2021 : Synthesizing Video Trajectory Queries »
Stephen Mell · Favyen Bastani · Stephan Zdancewic · Osbert Bastani -
2021 : Object Representations Guided By Optical Flow »
Jianing Qian · Dinesh Jayaraman -
2021 : Improving Human Decision-Making with Machine Learning »
Hamsa Bastani · Osbert Bastani · Park Sinchaisri -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2021 : Conservative and Adaptive Penalty for Model-Based Safe Reinforcement Learning »
Jason Yecheng Ma · Andrew Shen · Osbert Bastani · Dinesh Jayaraman -
2022 : Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms »
Vashist Avadhanula · Omar Abdul Baki · Hamsa Bastani · Osbert Bastani · Caner Gocmen · Daniel Haimovich · Darren Hwang · Dmytro Karamshuk · Thomas Leeper · Jiayuan Ma · Gregory macnamara · Jake Mullet · Christopher Palow · Sung Park · Varun S Rajagopal · Kevin Schaeffer · Parikshit Shah · Deeksha Sinha · Nicolas Stier-Moses · Ben Xu -
2022 : Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms »
Vashist Avadhanula · Omar Abdul Baki · Hamsa Bastani · Osbert Bastani · Caner Gocmen · Daniel Haimovich · Darren Hwang · Dmytro Karamshuk · Thomas Leeper · Jiayuan Ma · Gregory macnamara · Jake Mullet · Christopher Palow · Sung Park · Varun S Rajagopal · Kevin Schaeffer · Parikshit Shah · Deeksha Sinha · Nicolas Stier-Moses · Ben Xu -
2022 : Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training »
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang -
2022 : VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training »
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang -
2022 : Learning a Meta-Controller for Dynamic Grasping »
Yinsen Jia · Jingxi Xu · Dinesh Jayaraman · Shuran Song -
2022 : Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training »
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang -
2022 : Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms »
Vashist Avadhanula · Omar Abdul Baki · Hamsa Bastani · Osbert Bastani · Caner Gocmen · Daniel Haimovich · Darren Hwang · Dmytro Karamshuk · Thomas Leeper · Jiayuan Ma · Gregory macnamara · Jake Mullet · Christopher Palow · Sung Park · Varun S Rajagopal · Kevin Schaeffer · Parikshit Shah · Deeksha Sinha · Nicolas Stier-Moses · Ben Xu -
2022 : Policy Aware Model Learning via Transition Occupancy Matching »
Jason Yecheng Ma · Kausik Sivakumar · Osbert Bastani · Dinesh Jayaraman -
2022 : Robust Option Learning for Adversarial Generalization »
Kishor Jothimurugan · Steve Hsu · Osbert Bastani · Rajeev Alur -
2022 : VIP: Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training »
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang -
2022 : Learning a Meta-Controller for Dynamic Grasping »
Yinsen Jia · Jingxi Xu · Dinesh Jayaraman · Shuran Song -
2023 Poster: Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence? »
Arjun Majumdar · Karmesh Yadav · Sergio Arnaud · Jason Yecheng Ma · Claire Chen · Sneha Silwal · Aryan Jain · Vincent-Pierre Berges · Tingfan Wu · Jay Vakil · Pieter Abbeel · Jitendra Malik · Dhruv Batra · Yixin Lin · Oleksandr Maksymets · Aravind Rajeswaran · Franziska Meier -
2023 Workshop: Goal-Conditioned Reinforcement Learning »
Benjamin Eysenbach · Ishan Durugkar · Jason Yecheng Ma · Andi Peng · Tongzhou Wang · Amy Zhang -
2022 : Towards Universal Visual Reward and Representation via Value-Implicit Pre-Training »
Jason Yecheng Ma · Shagun Sodhani · Dinesh Jayaraman · Osbert Bastani · Vikash Kumar · Amy Zhang -
2022 : Bandits for Online Calibration: An Application to Content Moderation on Social Media Platforms »
Vashist Avadhanula · Omar Abdul Baki · Hamsa Bastani · Osbert Bastani · Caner Gocmen · Daniel Haimovich · Darren Hwang · Dmytro Karamshuk · Thomas Leeper · Jiayuan Ma · Gregory macnamara · Jake Mullet · Christopher Palow · Sung Park · Varun S Rajagopal · Kevin Schaeffer · Parikshit Shah · Deeksha Sinha · Nicolas Stier-Moses · Ben Xu -
2022 Poster: PAC Prediction Sets for Meta-Learning »
Sangdon Park · Edgar Dobriban · Insup Lee · Osbert Bastani -
2022 Poster: Neurosymbolic Deep Generative Models for Sequence Data with Relational Constraints »
Halley Young · Maxwell Du · Osbert Bastani -
2022 Poster: Regret Bounds for Risk-Sensitive Reinforcement Learning »
Osbert Bastani · Jason Yecheng Ma · Estelle Shen · Wanqiao Xu -
2022 Poster: Practical Adversarial Multivalid Conformal Prediction »
Osbert Bastani · Varun Gupta · Christopher Jung · Georgy Noarov · Ramya Ramalingam · Aaron Roth -
2021 Poster: Conservative Offline Distributional Reinforcement Learning »
Jason Yecheng Ma · Dinesh Jayaraman · Osbert Bastani -
2021 Poster: Compositional Reinforcement Learning from Logical Specifications »
Kishor Jothimurugan · Suguman Bansal · Osbert Bastani · Rajeev Alur -
2021 Poster: Program Synthesis Guided Reinforcement Learning for Partially Observed Environments »
Yichen Yang · Jeevana Priya Inala · Osbert Bastani · Yewen Pu · Armando Solar-Lezama · Martin Rinard -
2021 Poster: Learning Models for Actionable Recourse »
Alexis Ross · Himabindu Lakkaraju · Osbert Bastani -
2020 Poster: Neurosymbolic Transformers for Multi-Agent Communication »
Jeevana Priya Inala · Yichen Yang · James Paulos · Yewen Pu · Osbert Bastani · Vijay Kumar · Martin Rinard · Armando Solar-Lezama -
2020 Session: Orals & Spotlights Track 18: Deep Learning »
Yale Song · Dinesh Jayaraman -
2020 Poster: Long-Horizon Visual Planning with Goal-Conditioned Hierarchical Predictors »
Karl Pertsch · Oleh Rybkin · Frederik Ebert · Shenghao Zhou · Dinesh Jayaraman · Chelsea Finn · Sergey Levine -
2020 Poster: Fighting Copycat Agents in Behavioral Cloning from Observation Histories »
Chuan Wen · Jierui Lin · Trevor Darrell · Dinesh Jayaraman · Yang Gao -
2019 : Poster Session »
Matthia Sabatelli · Adam Stooke · Amir Abdi · Paulo Rauber · Leonard Adolphs · Ian Osband · Hardik Meisheri · Karol Kurach · Johannes Ackermann · Matt Benatan · GUO ZHANG · Chen Tessler · Dinghan Shen · Mikayel Samvelyan · Riashat Islam · Murtaza Dalal · Luke Harries · Andrey Kurenkov · Konrad Żołna · Sudeep Dasari · Kristian Hartikainen · Ofir Nachum · Kimin Lee · Markus Holzleitner · Vu Nguyen · Francis Song · Christopher Grimm · Felipe Leno da Silva · Yuping Luo · Yifan Wu · Alex Lee · Thomas Paine · Wei-Yang Qu · Daniel Graves · Yannis Flet-Berliac · Yunhao Tang · Suraj Nair · Matthew Hausknecht · Akhil Bagaria · Simon Schmitt · Bowen Baker · Paavo Parmas · Benjamin Eysenbach · Lisa Lee · Siyu Lin · Daniel Seita · Abhishek Gupta · Riley Simmons-Edler · Yijie Guo · Kevin Corder · Vikash Kumar · Scott Fujimoto · Adam Lerer · Ignasi Clavera Gilaberte · Nicholas Rhinehart · Ashvin Nair · Ge Yang · Lingxiao Wang · Sungryull Sohn · J. Fernando Hernandez-Garcia · Xian Yeow Lee · Rupesh Srivastava · Khimya Khetarpal · Chenjun Xiao · Luckeciano Carvalho Melo · Rishabh Agarwal · Tianhe Yu · Glen Berseth · Devendra Singh Chaplot · Jie Tang · Anirudh Srinivasan · Tharun Kumar Reddy Medini · Aaron Havens · Misha Laskin · Asier Mujika · Rohan Saphal · Joseph Marino · Alex Ray · Joshua Achiam · Ajay Mandlekar · Zhuang Liu · Danijar Hafner · Zhiwen Tang · Ted Xiao · Michael Walton · Jeff Druce · Ferran Alet · Zhang-Wei Hong · Stephanie Chan · Anusha Nagabandi · Hao Liu · Hao Sun · Ge Liu · Dinesh Jayaraman · John Co-Reyes · Sophia Sanborn -
2019 Poster: A Composable Specification Language for Reinforcement Learning Tasks »
Kishor Jothimurugan · Rajeev Alur · Osbert Bastani -
2018 Poster: Verifiable Reinforcement Learning via Policy Extraction »
Osbert Bastani · Yewen Pu · Armando Solar-Lezama