Timezone: »
More Efficient Adversarial Imitation Learning Algorithms With Known and Unknown Transitions
Tian Xu · Ziniu Li · Yang Yu
Tue Dec 14 09:00 AM  10:00 AM (PST) @
In this work, we design provably (more) efficient imitation learning algorithms that directly optimize policies from expert demonstrations. Firstly, when the transition function is known, we build on the nearly minimax optimal algorithm MIMICMD and relax a projection operator in it. Based on this change, we develop an adversarial imitation learning (AIL) algorithm named TAIL with a gradientbased optimization procedure. Accordingly, TAIL has the same sample complexity (i.e., the number of expert trajectories) $\widetilde{\mathcal{O}}(H^{3/2} \mathcal{S}/\varepsilon)$ with MIMICMD, where $H$ is the planning horizon, $\mathcal{S}$ is the state space size and $\varepsilon$ is desired policy value gap. This implies TAIL is better than conventional AIL methods such as FEM and GTAL since they have a sample complexity $\widetilde{\mathcal{O}}(H^2 \mathcal{S} / \varepsilon^2)$. In addition, TAIL is more practical than MIMICMD as the former has a space complexity $\mathcal{O} (\mathcal{S}\mathcal{A}H)$ while the latter's is about $\mathcal{O} (\mathcal{S}^2 \mathcal{A}^2 H^2)$. Secondly, when the transition function is unknown but the interaction is allowed, we present an extension of TAIL named MBTAIL. The sample complexity of MBTAIL is $\widetilde{\mathcal{O}}(H^{3/2} \mathcal{S}/\varepsilon)$ while the interaction complexity (i.e., the number of interaction episodes) is $\widetilde{\mathcal{O}} (H^3 \mathcal{S}^2 \mathcal{A} / \varepsilon^2)$. In particular, MBTAIL is significantly better than the bestknown OAL algorithm on both sample complexity and interaction complexity. The advances in MBTAIL are based on a new framework that connects rewardfree exploration and AIL. To our understanding, MBTAIL is the first algorithm that shifts the advances in the known transition setting to the unknown transition setting. Finally, we provide numerical results to support our theoretical claims and to explain some empirical observations in practice.
Author Information
Tian Xu (Nanjing University)
Ziniu Li (The Chinese University of Hong Kong, Shenzhen)
Yang Yu (Nanjing University)
More from the Same Authors

2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo 
2021 : HyperDQN: A Randomized Exploration Method for Deep Reinforcement Learning »
Ziniu Li · Yingru Li · Yushun Zhang · Tong Zhang · Zhiquan Luo 
2020 Poster: Error Bounds of Imitating Policies and Environments »
Tian Xu · Ziniu Li · Yang Yu 
2020 Poster: Offline Imitation Learning with a Misspecified Simulator »
Shengyi Jiang · Jingcheng Pang · Yang Yu 
2019 Poster: Bridging Machine Learning and Logical Reasoning by Abductive Learning »
WangZhou Dai · Qiuling Xu · Yang Yu · ZhiHua Zhou 
2017 Poster: Subset Selection under Noise »
Chao Qian · JingCheng Shi · Yang Yu · Ke Tang · ZhiHua Zhou