Timezone: »
More Efficient Adversarial Imitation Learning Algorithms With Known and Unknown Transitions
Tian Xu · Ziniu Li · Yang Yu
Tue Dec 14 09:00 AM -- 10:00 AM (PST) @
In this work, we design provably (more) efficient imitation learning algorithms that directly optimize policies from expert demonstrations. Firstly, when the transition function is known, we build on the nearly minimax optimal algorithm MIMIC-MD and relax a projection operator in it. Based on this change, we develop an adversarial imitation learning (AIL) algorithm named TAIL with a gradient-based optimization procedure. Accordingly, TAIL has the same sample complexity (i.e., the number of expert trajectories) $\widetilde{\mathcal{O}}(H^{3/2} |\mathcal{S}|/\varepsilon)$ with MIMIC-MD, where $H$ is the planning horizon, $|\mathcal{S}|$ is the state space size and $\varepsilon$ is desired policy value gap. This implies TAIL is better than conventional AIL methods such as FEM and GTAL since they have a sample complexity $\widetilde{\mathcal{O}}(H^2 |\mathcal{S}| / \varepsilon^2)$. In addition, TAIL is more practical than MIMIC-MD as the former has a space complexity $\mathcal{O} (|\mathcal{S}||\mathcal{A}|H)$ while the latter's is about $\mathcal{O} (|\mathcal{S}|^2 |\mathcal{A}|^2 H^2)$. Secondly, when the transition function is unknown but the interaction is allowed, we present an extension of TAIL named MB-TAIL. The sample complexity of MB-TAIL is $\widetilde{\mathcal{O}}(H^{3/2} |\mathcal{S}|/\varepsilon)$ while the interaction complexity (i.e., the number of interaction episodes) is $\widetilde{\mathcal{O}} (H^3 |\mathcal{S}|^2 |\mathcal{A}| / \varepsilon^2)$. In particular, MB-TAIL is significantly better than the best-known OAL algorithm on both sample complexity and interaction complexity. The advances in MB-TAIL are based on a new framework that connects reward-free exploration and AIL. To our understanding, MB-TAIL is the first algorithm that shifts the advances in the known transition setting to the unknown transition setting. Finally, we provide numerical results to support our theoretical claims and to explain some empirical observations in practice.
Author Information
Tian Xu (Nanjing University)
Ziniu Li (The Chinese University of Hong Kong, Shenzhen)
Yang Yu (Nanjing University)
More from the Same Authors
-
2022 Poster: Efficient Multi-agent Communication via Self-supervised Information Aggregation »
Cong Guan · Feng Chen · Lei Yuan · Chenghe Wang · Hao Yin · Zongzhang Zhang · Yang Yu -
2022 : Multi-Agent Policy Transfer via Task Relationship Modeling »
Rong-Jun Qin · Feng Chen · Tonghan Wang · Lei Yuan · Xiaoran Wu · Yipeng Kang · Zongzhang Zhang · Chongjie Zhang · Yang Yu -
2023 Poster: Imitation Learning from Imperfection: Theoretical Justifications and Algorithms »
Ziniu Li · Tian Xu · Zeyu Qin · Yang Yu · Zhi-Quan Luo -
2023 Poster: Adversarial Counterfactual Environment Model Learning »
Xiong-Hui Chen · Yang Yu · Zhengmao Zhu · ZhiHua Yu · Chen Zhenjun · Chenghe Wang · Yinan Wu · Rong-Jun Qin · Hongqiu Wu · Ruijin Ding · Huang Fangsheng -
2023 Poster: Natural Language-conditioned Reinforcement Learning with Task-related Language Development and Translation »
Jingcheng Pang · Xin-Yu Yang · Si-Hang Yang · Xiong-Hui Chen · Yang Yu -
2023 Poster: Learning World Models with Identifiable Factorization »
Yuren Liu · Biwei Huang · Zhengmao Zhu · Honglong Tian · Mingming Gong · Yang Yu · Kun Zhang -
2022 Spotlight: Lightning Talks 5A-3 »
Minting Pan · Xiang Chen · Wenhan Huang · Can Chang · Zhecheng Yuan · Jianzhun Shao · Yushi Cao · Peihao Chen · Ke Xue · Zhengrong Xue · Zhiqiang Lou · Xiangming Zhu · Lei Li · Zhiming Li · Kai Li · Jiacheng Xu · Dongyu Ji · Ni Mu · Kun Shao · Tianpei Yang · Kunyang Lin · Ningyu Zhang · Yunbo Wang · Lei Yuan · Bo Yuan · Hongchang Zhang · Jiajun Wu · Tianze Zhou · Xueqian Wang · Ling Pan · Yuhang Jiang · Xiaokang Yang · Xiaozhuan Liang · Hao Zhang · Weiwen Hu · Miqing Li · YAN ZHENG · Matthew Taylor · Huazhe Xu · Shumin Deng · Chao Qian · YI WU · Shuncheng He · Wenbing Huang · Chuanqi Tan · Zongzhang Zhang · Yang Gao · Jun Luo · Yi Li · Xiangyang Ji · Thomas Li · Mingkui Tan · Fei Huang · Yang Yu · Huazhe Xu · Dongge Wang · Jianye Hao · Chuang Gan · Yang Liu · Luo Si · Hangyu Mao · Huajun Chen · Jianye Hao · Jun Wang · Xiaotie Deng -
2022 Spotlight: Multi-agent Dynamic Algorithm Configuration »
Ke Xue · Jiacheng Xu · Lei Yuan · Miqing Li · Chao Qian · Zongzhang Zhang · Yang Yu -
2022 Spotlight: Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning »
Chenyang Wu · Tianci Li · Zongzhang Zhang · Yang Yu -
2022 Spotlight: Lightning Talks 4B-1 »
Alexandra Senderovich · Zhijie Deng · Navid Ansari · Xuefei Ning · Yasmin Salehi · Xiang Huang · Chenyang Wu · Kelsey Allen · Jiaqi Han · Nikita Balagansky · Tatiana Lopez-Guevara · Tianci Li · Zhanhong Ye · Zixuan Zhou · Feng Zhou · Ekaterina Bulatova · Daniil Gavrilov · Wenbing Huang · Dennis Giannacopoulos · Hans-peter Seidel · Anton Obukhov · Kimberly Stachenfeld · Hongsheng Liu · Jun Zhu · Junbo Zhao · Hengbo Ma · Nima Vahidi Ferdowsi · Zongzhang Zhang · Vahid Babaei · Jiachen Li · Alvaro Sanchez Gonzalez · Yang Yu · Shi Ji · Maxim Rakhuba · Tianchen Zhao · Yiping Deng · Peter Battaglia · Josh Tenenbaum · Zidong Wang · Chuang Gan · Changcheng Tang · Jessica Hamrick · Kang Yang · Tobias Pfaff · Yang Li · Shuang Liang · Min Wang · Huazhong Yang · Haotian CHU · Yu Wang · Fan Yu · Bei Hua · Lei Chen · Bin Dong -
2022 Poster: NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning »
Rong-Jun Qin · Xingyuan Zhang · Songyi Gao · Xiong-Hui Chen · Zewen Li · Weinan Zhang · Yang Yu -
2022 Poster: Multi-agent Dynamic Algorithm Configuration »
Ke Xue · Jiacheng Xu · Lei Yuan · Miqing Li · Chao Qian · Zongzhang Zhang · Yang Yu -
2022 Poster: Bayesian Optimistic Optimization: Optimistic Exploration for Model-based Reinforcement Learning »
Chenyang Wu · Tianci Li · Zongzhang Zhang · Yang Yu -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo -
2020 Poster: Error Bounds of Imitating Policies and Environments »
Tian Xu · Ziniu Li · Yang Yu -
2020 Poster: Offline Imitation Learning with a Misspecified Simulator »
Shengyi Jiang · Jingcheng Pang · Yang Yu -
2019 Poster: Bridging Machine Learning and Logical Reasoning by Abductive Learning »
Wang-Zhou Dai · Qiuling Xu · Yang Yu · Zhi-Hua Zhou -
2017 Poster: Subset Selection under Noise »
Chao Qian · Jing-Cheng Shi · Yang Yu · Ke Tang · Zhi-Hua Zhou