Timezone: »
Measuring and promoting policy diversity is critical for solving games with strong non-transitive dynamics where strategic cycles exist, and there is no consistent winner (e.g., Rock-Paper-Scissors). With that in mind, maintaining a pool of diverse policies via open-ended learning is an attractive solution, which can generate auto-curricula to avoid being exploited. However, in conventional open-ended learning algorithms, there are no widely accepted definitions for diversity, making it hard to construct and evaluate the diverse policies. In this work, we summarize previous concepts of diversity and work towards offering a unified measure of diversity in multi-agent open-ended learning to include all elements in Markov games, based on both Behavioral Diversity (BD) and Response Diversity (RD). At the trajectory distribution level, we re-define BD in the state-action space as the discrepancies of occupancy measures. For the reward dynamics, we propose RD to characterize diversity through the responses of policies when encountering different opponents. We also show that many current diversity measures fall in one of the categories of BD or RD but not both. With this unified diversity measure, we design the corresponding diversity-promoting objective and population effectivity when seeking the best responses in open-ended learning. We validate our methods in both relatively simple games like matrix game, non-transitive mixture model, and the complex \textit{Google Research Football} environment. The population found by our methods reveals the lowest exploitability, highest population effectivity in matrix game and non-transitive mixture model, as well as the largest goal difference when interacting with opponents of various levels in \textit{Google Research Football}.
Author Information
Xiangyu Liu (Shanghai Jiao Tong University)
Hangtian Jia (Netease Fuxi AI Lab)
Ying Wen (UCL)
Yujing Hu (NetEase Fuxi AI Lab)
Yingfeng Chen (NetEase Fuxi AI Lab)
Changjie Fan (NetEase Fuxi AI Lab)
ZHIPENG HU (NetEase)
Yaodong Yang (University College London)
More from the Same Authors
-
2022 : Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning »
Xiangyu Liu · Souradip Chakraborty · Furong Huang -
2022 : EUCLID: Towards Efficient Unsupervised Reinforcement Learning with Multi-choice Dynamics Model »
Yifu Yuan · Jianye Hao · Fei Ni · Yao Mu · YAN ZHENG · Yujing Hu · Jinyi Liu · Yingfeng Chen · Changjie Fan -
2022 : Model and Method: Training-Time Attack for Cooperative Multi-Agent Reinforcement Learning »
Siyang Wu · Tonghan Wang · Xiaoran Wu · Jingfeng ZHANG · Yujing Hu · Changjie Fan · Chongjie Zhang -
2022 : Contributed Talk: Controllable Attack and Improved Adversarial Training in Multi-Agent Reinforcement Learning »
Xiangyu Liu · Souradip Chakraborty · Furong Huang -
2021 Poster: Episodic Multi-agent Reinforcement Learning with Curiosity-driven Exploration »
Lulu Zheng · Jiarui Chen · Jianhao Wang · Jiamin He · Yujing Hu · Yingfeng Chen · Changjie Fan · Yang Gao · Chongjie Zhang -
2021 Poster: Settling the Variance of Multi-Agent Policy Gradients »
Jakub Grudzien Kuba · Muning Wen · Linghui Meng · shangding gu · Haifeng Zhang · David Mguni · Jun Wang · Yaodong Yang -
2021 Poster: An Efficient Transfer Learning Framework for Multiagent Reinforcement Learning »
Tianpei Yang · Weixun Wang · Hongyao Tang · Jianye Hao · Zhaopeng Meng · Hangyu Mao · Dong Li · Wulong Liu · Yingfeng Chen · Yujing Hu · Changjie Fan · Chengwei Zhang -
2021 Poster: Neural Auto-Curricula in Two-Player Zero-Sum Games »
Xidong Feng · Oliver Slumbers · Ziyu Wan · Bo Liu · Stephen McAleer · Ying Wen · Jun Wang · Yaodong Yang -
2020 Poster: Learning to Utilize Shaping Rewards: A New Approach of Reward Shaping »
Yujing Hu · Weixun Wang · Hangtian Jia · Yixiang Wang · Yingfeng Chen · Jianye Hao · Feng Wu · Changjie Fan -
2020 Poster: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets »
Rui Luo · Qiang Zhang · Yaodong Yang · Jun Wang -
2017 : Contributed Talks 1 »
Cinjon Resnick · Ying Wen · Stephan Zheng · Mukul Bhutani · Edward Choi