Timezone: »
Goal-conditioned reinforcement learning (RL) usually suffers from sparse reward and inefficient exploration in long-horizon tasks. Planning can find the shortest path to a distant goal that provides dense reward/guidance but is inaccurate without a precise environment model. We show that RL and planning can collaboratively learn from each other to overcome their own drawbacks. In ''CO-PILOT'', a learnable path-planner and an RL agent produce dense feedback to train each other on a curriculum of tree-structured sub-tasks. Firstly, the planner recursively decomposes a long-horizon task to a tree of sub-tasks in a top-down manner, whose layers construct coarse-to-fine sub-task sequences as plans to complete the original task. The planning policy is trained to minimize the RL agent's cost of completing the sequence in each layer from top to bottom layers, which gradually increases the sub-tasks and thus forms an easy-to-hard curriculum for the planner. Next, a bottom-up traversal of the tree trains the RL agent from easier sub-tasks with denser rewards on bottom layers to harder ones on top layers and collects its cost on each sub-task train the planner in the next episode. CO-PILOT repeats this mutual training for multiple episodes before switching to a new task, so the RL agent and planner are fully optimized to facilitate each other's training. We compare CO-PILOT with RL (SAC, HER, PPO), planning (RRT*, NEXT, SGT), and their combination (SoRB) on navigation and continuous control tasks. CO-PILOT significantly improves the success rate and sample efficiency.
Author Information
Shuang Ao (University of Technology Sydney)
Tianyi Zhou (University of Washington, Seattle)

Tianyi Zhou (https://tianyizhou.github.io) is a tenure-track assistant professor of computer science at the University of Maryland, College Park. He received his Ph.D. from the school of computer science & engineering at the University of Washington, Seattle. His research interests are in machine learning, optimization, and natural language processing (NLP). His recent works study curriculum learning that can combine high-level human learning strategies with model training dynamics to create a hybrid intelligence. The applications include semi/self-supervised learning, robust learning, reinforcement learning, meta-learning, ensemble learning, etc. He published >80 papers and is a recipient of the Best Student Paper Award at ICDM 2013 and the 2020 IEEE Computer Society TCSC Most Influential Paper Award.
Guodong Long (University of Technology Sydney (UTS))
Qinghua Lu (Data61, CSIRO)
Liming Zhu (CSIRO)
Jing Jiang (University of Technology Sydney)
More from the Same Authors
-
2021 Spotlight: Constrained Robust Submodular Partitioning »
Shengjie Wang · Tianyi Zhou · Chandrashekhar Lavania · Jeff A Bilmes -
2022 Spotlight: Federated Learning from Pre-Trained Models: A Contrastive Learning Approach »
Yue Tan · Guodong Long · Jie Ma · LU LIU · Tianyi Zhou · Jing Jiang -
2022 Spotlight: Lightning Talks 3A-1 »
Shu Ding · Wanxing Chang · Jiyang Guan · Mouxiang Chen · Guan Gui · Yue Tan · Shiyun Lin · Guodong Long · Yuze Han · Wei Wang · Zhen Zhao · Ye Shi · Jian Liang · Chenghao Liu · Lei Qi · Ran He · Jie Ma · Zemin Liu · Xiang Li · Hoang Tuan · Luping Zhou · Zhihua Zhang · Jianling Sun · Jingya Wang · LU LIU · Tianyi Zhou · Lei Wang · Jing Jiang · Yinghuan Shi -
2022 Spotlight: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 Poster: Federated Learning from Pre-Trained Models: A Contrastive Learning Approach »
Yue Tan · Guodong Long · Jie Ma · LU LIU · Tianyi Zhou · Jing Jiang -
2022 Poster: Adversarial Auto-Augment with Label Preservation: A Representation Learning Principle Guided Approach »
Kaiwen Yang · Yanchao Sun · Jiahao Su · Fengxiang He · Xinmei Tian · Furong Huang · Tianyi Zhou · Dacheng Tao -
2022 Poster: Retrospective Adversarial Replay for Continual Learning »
Lilly Kumari · Shengjie Wang · Tianyi Zhou · Jeff A Bilmes -
2021 Poster: Constrained Robust Submodular Partitioning »
Shengjie Wang · Tianyi Zhou · Chandrashekhar Lavania · Jeff A Bilmes -
2021 Poster: Class-Disentanglement and Applications in Adversarial Detection and Defense »
Kaiwen Yang · Tianyi Zhou · Yonggang Zhang · Xinmei Tian · Dacheng Tao -
2020 Poster: Curriculum Learning by Dynamic Instance Hardness »
Tianyi Zhou · Shengjie Wang · Jeffrey A Bilmes -
2020 Poster: MESA: Boost Ensemble Imbalanced Learning with MEta-SAmpler »
Zhining Liu · Pengfei Wei · Jing Jiang · Wei Cao · Jiang Bian · Yi Chang -
2020 Poster: Cooperative Heterogeneous Deep Reinforcement Learning »
Han Zheng · Pengfei Wei · Jing Jiang · Guodong Long · Qinghua Lu · Chengqi Zhang -
2019 Poster: Curriculum-guided Hindsight Experience Replay »
Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang -
2019 Poster: Learning to Propagate for Graph Meta-Learning »
LU LIU · Tianyi Zhou · Guodong Long · Jing Jiang · Chengqi Zhang -
2018 Poster: Diverse Ensemble Evolution: Curriculum Data-Model Marriage »
Tianyi Zhou · Shengjie Wang · Jeffrey A Bilmes -
2014 Poster: Divide-and-Conquer Learning by Anchoring a Conical Hull »
Tianyi Zhou · Jeffrey A Bilmes · Carlos Guestrin