Timezone: »
This paper studies offline Imitation Learning (IL) where an agent learns to imitate an expert demonstrator without additional online environment interactions. Instead, the learner is presented with a static offline dataset of state-action-next state triples from a potentially less proficient behavior policy. We introduce Model-based IL from Offline data (MILO): an algorithmic framework that utilizes the static dataset to solve the offline IL problem efficiently both in theory and in practice. In theory, even if the behavior policy is highly sub-optimal compared to the expert, we show that as long as the data from the behavior policy provides sufficient coverage on the expert state-action traces (and with no necessity for a global coverage over the entire state-action space), MILO can provably combat the covariate shift issue in IL. Complementing our theory results, we also demonstrate that a practical implementation of our approach mitigates covariate shift on benchmark MuJoCo continuous control tasks. We demonstrate that with behavior policies whose performances are less than half of that of the expert, MILO still successfully imitates with an extremely low number of expert state-action pairs while traditional offline IL methods such as behavior cloning (BC) fail completely. Source code is provided at https://github.com/jdchang1/milo.
Author Information
Jonathan Chang (Cornell University)
Masatoshi Uehara (Cornell University)
Dhruv Sreenivas (Cornell University)
Rahul Kidambi (Amazon Search & AI)
Wen Sun (Cornell University)
More from the Same Authors
-
2021 : Pessimistic Model-based Offline Reinforcement Learning under Partial Coverage »
Masatoshi Uehara · Wen Sun -
2022 : Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies »
Shachi Deshpande · Kaiwen Wang · Dhruv Sreenivas · Zheng Li · Volodymyr Kuleshov -
2022 Poster: Provably Efficient Reinforcement Learning in Partially Observable Dynamical Systems »
Masatoshi Uehara · Ayush Sekhari · Jason Lee · Nathan Kallus · Wen Sun -
2022 Poster: Deep Multi-Modal Structural Equations For Causal Effect Estimation With Unstructured Proxies »
Shachi Deshpande · Kaiwen Wang · Dhruv Sreenivas · Zheng Li · Volodymyr Kuleshov -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice »
Aurelien Bibaut · Maria Dimakopoulou · Nathan Kallus · Xinkun Nie · Masatoshi Uehara · Kelly Zhang -
2021 : Representation Learning for Online and Offline RL in Low-rank MDPs »
Masatoshi Uehara · Xuezhou Zhang · Wen Sun -
2021 Poster: MobILE: Model-Based Imitation Learning From Observation Alone »
Rahul Kidambi · Jonathan Chang · Wen Sun -
2020 Poster: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2020 Spotlight: Off-Policy Evaluation and Learning for External Validity under a Covariate Shift »
Masatoshi Uehara · Masahiro Kato · Shota Yasui -
2020 Poster: Doubly Robust Off-Policy Value and Gradient Estimation for Deterministic Policies »
Nathan Kallus · Masatoshi Uehara -
2020 Poster: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning »
Alekh Agarwal · Mikael Henaff · Sham Kakade · Wen Sun -
2020 Poster: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Spotlight: Multi-Robot Collision Avoidance under Uncertainty with Probabilistic Safety Barrier Certificates »
Wenhao Luo · Wen Sun · Ashish Kapoor -
2020 Oral: FLAMBE: Structural Complexity and Representation Learning of Low Rank MDPs »
Alekh Agarwal · Sham Kakade · Akshay Krishnamurthy · Wen Sun -
2020 Poster: MOReL: Model-Based Offline Reinforcement Learning »
Rahul Kidambi · Aravind Rajeswaran · Praneeth Netrapalli · Thorsten Joachims -
2020 Poster: Information Theoretic Regret Bounds for Online Nonlinear Control »
Sham Kakade · Akshay Krishnamurthy · Kendall Lowrey · Motoya Ohnishi · Wen Sun -
2019 Poster: The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares »
Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli -
2018 : Drew Bagnell / Wen Sun »
James Bagnell · Wen Sun -
2018 : Coffee Break and Poster Session I »
Pim de Haan · Bin Wang · Dequan Wang · Aadil Hayat · Ibrahim Sobh · Muhammad Asif Rana · Thibault Buhet · Nicholas Rhinehart · Arjun Sharma · Alex Bewley · Michael Kelly · Lionel Blondé · Ozgur S. Oguz · Vaibhav Viswanathan · Jeroen Vanbaar · Konrad Żołna · Negar Rostamzadeh · Rowan McAllister · Sanjay Thakur · Alexandros Kalousis · Chelsea Sidrane · Sujoy Paul · Daphne Chen · Michal Garmulewicz · Henryk Michalewski · Coline Devin · Hongyu Ren · Jiaming Song · Wen Sun · Hanzhang Hu · Wulong Liu · Emilie Wirbel -
2018 Poster: Dual Policy Iteration »
Wen Sun · Geoffrey Gordon · Byron Boots · J. Bagnell -
2017 Poster: Predictive-State Decoders: Encoding the Future into Recurrent Networks »
Arun Venkatraman · Nicholas Rhinehart · Wen Sun · Lerrel Pinto · Martial Hebert · Byron Boots · Kris Kitani · J. Bagnell -
2015 Poster: Submodular Hamming Metrics »
Jennifer Gillenwater · Rishabh K Iyer · Bethany Lusch · Rahul Kidambi · Jeffrey A Bilmes -
2015 Spotlight: Submodular Hamming Metrics »
Jennifer Gillenwater · Rishabh K Iyer · Bethany Lusch · Rahul Kidambi · Jeffrey A Bilmes