Timezone: »
Efficient exploration in deep cooperative multi-agent reinforcement learning (MARL) still remains challenging in complex coordination problems. In this paper, we introduce a novel Episodic Multi-agent reinforcement learning with Curiosity-driven exploration, called EMC. We leverage an insight of popular factorized MARL algorithms that the ``induced" individual Q-values, i.e., the individual utility functions used for local execution, are the embeddings of local action-observation histories, and can capture the interaction between agents due to reward backpropagation during centralized training. Therefore, we use prediction errors of individual Q-values as intrinsic rewards for coordinated exploration and utilize episodic memory to exploit explored informative experience to boost policy training. As the dynamics of an agent's individual Q-value function captures the novelty of states and the influence from other agents, our intrinsic reward can induce coordinated exploration to new or promising states. We illustrate the advantages of our method by didactic examples, and demonstrate its significant outperformance over state-of-the-art MARL baselines on challenging tasks in the StarCraft II micromanagement benchmark.
Author Information
Lulu Zheng (Tsinghua University, Tsinghua University)
Jiarui Chen (Nanjing University)
Jianhao Wang (Tsinghua University)
Jiamin He (University of Alberta)
Yujing Hu (NetEase Fuxi AI Lab)
Yingfeng Chen (NetEase Fuxi AI Lab)
Changjie Fan (NetEase Fuxi AI Lab)
Yang Gao (Nanjing University)
Chongjie Zhang (Tsinghua University)
More from the Same Authors
-
2022 : The Emphatic Approach to Average-Reward Policy Evaluation »
Jiamin He · Yi Wan · Rupam Mahmood -
2022 : Multi-Agent Policy Transfer via Task Relationship Modeling »
Rong-Jun Qin · Feng Chen · Tonghan Wang · Lei Yuan · Xiaoran Wu · Yipeng Kang · Zongzhang Zhang · Chongjie Zhang · Yang Yu -
2022 : EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model »
Yifu Yuan · Jianye Hao · Fei Ni · Yao Mu · YAN ZHENG · Yujing Hu · Jinyi Liu · Yingfeng Chen · Changjie Fan -
2022 : Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning »
Siyang Wu · Tonghan Wang · Xiaoran Wu · Jingfeng ZHANG · Yujing Hu · Changjie Fan · Chongjie Zhang -
2022 Spotlight: CUP: Critic-Guided Policy Reuse »
Jin Zhang · Siyuan Li · Chongjie Zhang -
2022 Spotlight: RORL: Robust Offline Reinforcement Learning via Conservative Smoothing »
Rui Yang · Chenjia Bai · Xiaoteng Ma · Zhaoran Wang · Chongjie Zhang · Lei Han -
2022 Spotlight: Lightning Talks 5A-1 »
Yao Mu · Jin Zhang · Haoyi Niu · Rui Yang · Mingdong Wu · Ze Gong · shubham sharma · Chenjia Bai · Yu ("Tony") Zhang · Siyuan Li · Yuzheng Zhuang · Fangwei Zhong · Yiwen Qiu · Xiaoteng Ma · Fei Ni · Yulong Xia · Chongjie Zhang · Hao Dong · Ming Li · Zhaoran Wang · Bin Wang · Chongjie Zhang · Jianyu Chen · Guyue Zhou · Lei Han · Jianming HU · Jianye Hao · Xianyuan Zhan · Ping Luo -
2022 Spotlight: Non-Linear Coordination Graphs »
Yipeng Kang · Tonghan Wang · Qianlan Yang · Chongjie Zhang -
2021 Poster: Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games »
Xiangyu Liu · Hangtian Jia · Ying Wen · Yujing Hu · Yingfeng Chen · Changjie Fan · ZHIPENG HU · Yaodong Yang -
2021 Poster: On the Estimation Bias in Double Q-Learning »
Zhizhou Ren · Guangxiang Zhu · Hao Hu · Beining Han · Jianglun Chen · Chongjie Zhang -
2021 Poster: Model-Based Reinforcement Learning via Imagination with Derived Memory »
Yao Mu · Yuzheng Zhuang · Bin Wang · Guangxiang Zhu · Wulong Liu · Jianyu Chen · Ping Luo · Shengbo Li · Chongjie Zhang · Jianye Hao -
2021 Poster: Offline Reinforcement Learning with Reverse Model-based Imagination »
Jianhao Wang · Wenzhe Li · Haozhe Jiang · Guangxiang Zhu · Siyuan Li · Chongjie Zhang -
2021 Poster: Towards Understanding Cooperative Multi-Agent Q-Learning with Value Factorization »
Jianhao Wang · Zhizhou Ren · Beining Han · Jianing Ye · Chongjie Zhang -
2021 Poster: Celebrating Diversity in Shared Multi-Agent Reinforcement Learning »
Chenghao Li · Tonghan Wang · Chengjie Wu · Qianchuan Zhao · Jun Yang · Chongjie Zhang -
2021 Poster: An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning »
Tianpei Yang · Weixun Wang · Hongyao Tang · Jianye Hao · Zhaopeng Meng · Hangyu Mao · Dong Li · Wulong Liu · Yingfeng Chen · Yujing Hu · Changjie Fan · Chengwei Zhang -
2020 Poster: Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping »
Yujing Hu · Weixun Wang · Hangtian Jia · Yixiang Wang · Yingfeng Chen · Jianye Hao · Feng Wu · Changjie Fan -
2020 Poster: Bridging Imagination and Reality for Model-Based Deep Reinforcement Learning »
Guangxiang Zhu · Minghao Zhang · Honglak Lee · Chongjie Zhang -
2019 Poster: Hierarchical Reinforcement Learning with Advantage-Based Auxiliary Rewards »
Siyuan Li · Rui Wang · Minxue Tang · Chongjie Zhang -
2018 Poster: Object-Oriented Dynamics Predictor »
Guangxiang Zhu · Zhiao Huang · Chongjie Zhang