Timezone: »
Most existing imitation learning approaches assume the demonstrations are drawn from experts who are optimal, but relaxing this assumption enables us to use a wider range of data. Standard imitation learning may learn a suboptimal policy from demonstrations with varying optimality. Prior works use confidence scores or rankings to capture beneficial information from demonstrations with varying optimality, but they suffer from many limitations, e.g., manually annotated confidence scores or high average optimality of demonstrations. In this paper, we propose a general framework to learn from demonstrations with varying optimality that jointly learns the confidence score and a well-performing policy. Our approach, Confidence-Aware Imitation Learning (CAIL) learns a well-performing policy from confidence-reweighted demonstrations, while using an outer loss to track the performance of our model and to learn the confidence. We provide theoretical guarantees on the convergence of CAIL and evaluate its performance in both simulated and real robot experiments.Our results show that CAIL significantly outperforms other imitation learning methods from demonstrations with varying optimality. We further show that even without access to any optimal demonstrations, CAIL can still learn a successful policy, and outperforms prior work.
Author Information
Songyuan Zhang (Massachusetts Institute of Technology)
ZHANGJIE CAO (Stanford University)
Dorsa Sadigh (Stanford)
Yanan Sui (California Institute of Technology)
More from the Same Authors
-
2021 : When Humans Aren’t Optimal: Robots that Collaborate with Risk-Aware Humans »
Minae Kwon · Erdem Biyik · Aditi Talati · Karan Bhasin · Dylan Losey · Dorsa Sadigh -
2023 Poster: Inverse Preference Learning: Preference-based RL without a Reward Function »
Joey Hejna · Dorsa Sadigh -
2023 Poster: Parallel Sampling of Diffusion Models »
Andy Shih · Suneel Belkhale · Stefano Ermon · Dorsa Sadigh · Nima Anari -
2023 Poster: Diverse Conventions for Human-AI Collaboration »
Bidipta Sarkar · Andy Shih · Dorsa Sadigh -
2023 Poster: Data Quality in Imitation Learning »
Suneel Belkhale · Yuchen Cui · Dorsa Sadigh -
2023 Poster: RoboCLIP: One Demonstration is Enough to Learn Robot Policies »
Sumedh Sontakke · Séb Arnold · Jesse Zhang · Karl Pertsch · Erdem Biyik · Dorsa Sadigh · Chelsea Finn · Laurent Itti -
2022 : Panel Discussion »
Kamalika Chaudhuri · Been Kim · Dorsa Sadigh · Huan Zhang · Linyi Li -
2022 : Invited Talk: Dorsa Sadigh »
Dorsa Sadigh -
2022 : Dorsa Sadigh: Aligning Robot Representations with Humans »
Dorsa Sadigh -
2022 : Aligning Humans and Robots: Active Elicitation of Informative and Compatible Queries »
Dorsa Sadigh -
2022 : Invited Talk: Dorsa Sadigh »
Dorsa Sadigh · Siddharth Karamcheti -
2022 Poster: Assistive Teaching of Motor Control Tasks to Humans »
Megha Srivastava · Erdem Biyik · Suvir Mirchandani · Noah Goodman · Dorsa Sadigh -
2022 Poster: Training and Inference on Any-Order Autoregressive Models the Right Way »
Andy Shih · Dorsa Sadigh · Stefano Ermon -
2021 : Invited Talk: Dorsa Sadigh (Stanford University) on The Role of Conventions in Adaptive Human-AI Interaction »
Dorsa Sadigh -
2021 Poster: Safe Policy Optimization with Local Generalized Linear Function Approximations »
Akifumi Wachi · Yunyue Wei · Yanan Sui -
2021 Poster: HyperSPNs: Compact and Expressive Probabilistic Circuits »
Andy Shih · Dorsa Sadigh · Stefano Ermon -
2021 Poster: Imitation with Neural Density Models »
Kuno Kim · Akshat Jindal · Yang Song · Jiaming Song · Yanan Sui · Stefano Ermon -
2021 Poster: ELLA: Exploration through Learned Language Abstraction »
Suvir Mirchandani · Siddharth Karamcheti · Dorsa Sadigh -
2020 : Discussion Panel »
Pete Florence · Dorsa Sadigh · Carolina Parada · Jeannette Bohg · Roberto Calandra · Peter Stone · Fabio Ramos -
2020 : Invited Talk - "Walking the Boundary of Learning and Interaction" »
Dorsa Sadigh · Erdem Biyik -
2018 : Panel »
Yimeng Zhang · Alfredo Canziani · Marco Pavone · Dorsa Sadigh · Kurt Keutzer -
2018 : Invited Talk: Dorsa Sadigh, Stanford »
Dorsa Sadigh -
2018 : Dorsa Sadigh »
Dorsa Sadigh -
2018 Poster: Multi-Agent Generative Adversarial Imitation Learning »
Jiaming Song · Hongyu Ren · Dorsa Sadigh · Stefano Ermon -
2018 Poster: Conditional Adversarial Domain Adaptation »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Michael Jordan -
2017 : Coffee break and Poster Session I »
Nishith Khandwala · Steve Gallant · Gregory Way · Aniruddh Raghu · Li Shen · Aydan Gasimova · Alican Bozkurt · William Boag · Daniel Lopez-Martinez · Ulrich Bodenhofer · Samaneh Nasiri GhoshehBolagh · Michelle Guo · Christoph Kurz · Kirubin Pillay · Kimis Perros · George H Chen · Alexandre Yahi · Madhumita Sushil · Sanjay Purushotham · Elena Tutubalina · Tejpal Virdi · Marc-Andre Schulz · Samuel Weisenthal · Bharat Srikishan · Petar Veličković · Kartik Ahuja · Andrew Miller · Erin Craig · Disi Ji · Filip Dabek · Chloé Pou-Prom · Hejia Zhang · Janani Kalyanam · Wei-Hung Weng · Harish Bhat · Hugh Chen · Simon Kohl · Mingwu Gao · Tingting Zhu · Ming-Zher Poh · Iñigo Urteaga · Antoine Honoré · Alessandro De Palma · Maruan Al-Shedivat · Pranav Rajpurkar · Matthew McDermott · Vincent Chen · Yanan Sui · Yun-Geun Lee · Li-Fang Cheng · Chen Fang · Sibt ul Hussain · Cesare Furlanello · Zeev Waks · Hiba Chougrad · Hedvig Kjellstrom · Finale Doshi-Velez · Wolfgang Fruehwirt · Yanqing Zhang · Lily Hu · Junfang Chen · Sunho Park · Gatis Mikelsons · Jumana Dakka · Stephanie Hyland · yann chevaleyre · Hyunwoo Lee · Xavier Giro-i-Nieto · David Kale · Michael Hughes · Gabriel Erion · Rishab Mehra · William Zame · Stojan Trajanovski · Prithwish Chakraborty · Kelly Peterson · Muktabh Mayank Srivastava · Amy Jin · Heliodoro Tejeda Lemus · Priyadip Ray · Tamas Madl · Joseph Futoma · Enhao Gong · Syed Rameel Ahmad · Eric Lei · Ferdinand Legros -
2017 Poster: Learning Multiple Tasks with Multilinear Relationship Networks »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Philip S Yu