Timezone: »
Promoting behavioural diversity is of critical importance in multi-agent reinforcement learning, since it helps the agent population maintain robust performance when encountering unfamiliar opponents at test time, or, when the game is highly non-transitive in the strategy space (e.g., Rock-Paper-Scissor). While a myriad of diversity metrics have been proposed, there are no widely accepted or unified definitions in the literature, making the consequent diversity-aware learning algorithms difficult to evaluate and the insights elusive. In this work, we propose a novel metric called the Unified Diversity Measure (UDM) that offers a unified view for existing diversity metrics. Based on UDM, we design the UDM-Fictitious Play (UDM-FP) and UDM-Policy Space Response Oracle (UDM-PSRO) algorithms as efficient solvers for normal-form games and open-ended games. In theory, we prove that UDM-based methods can enlarge the gamescape by increasing the response capacity of the strategy pool, and have convergence guarantee to two-player Nash equilibrium. We validate our algorithms on games that show strong non-transitivity, and empirical results show that our algorithms achieve better performances than strong PSRO baselines in terms of the exploitability and population effectivity.
Author Information
Zongkai Liu (Sun Yat-sen University)
Chao Yu (Sun Yat-sen University)
Yaodong Yang (AIG)
peng sun (Tencent AI Lab)
Zifan Wu (Sun Yat-sen University)
Yuan Li (Academy of Military Sciences)
More from the Same Authors
-
2022 Poster: Meta-Reward-Net: Implicitly Differentiable Reward Learning for Preference-based Reinforcement Learning »
Runze Liu · Fengshuo Bai · Yali Du · Yaodong Yang -
2022 Poster: Constrained Update Projection Approach to Safe Policy Optimization »
Long Yang · Jiaming Ji · Juntao Dai · Linrui Zhang · Binbin Zhou · Pengfei Li · Yaodong Yang · Gang Pan -
2022 Poster: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Poster: Heterogeneous Skill Learning for Multi-agent Tasks »
Yuntao Liu · Yuan Li · Xinhai Xu · Yong Dou · Donghong Liu -
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : TorchOpt: An Efficient library for Differentiable Optimization »
Jie Ren · Xidong Feng · Bo Liu · Xuehai Pan · Yao Fu · Luo Mai · Yaodong Yang -
2022 : Contextual Transformer for Offline Meta Reinforcement Learning »
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang -
2023 Poster: Hierarchical Multi-Agent Skill Discovery »
Mingyu Yang · Yaodong Yang · Zhenbo Lu · Wengang Zhou · Houqiang Li -
2023 Poster: Team-PSRO for Learning Approximate TMECor in Large Team Games via Cooperative Reinforcement Learning »
Stephen McAleer · Gabriele Farina · Gaoyue Zhou · Mingzhi Wang · Yaodong Yang · Tuomas Sandholm -
2023 Poster: Multi-Agent First Order Constrained Optimization in Policy Space »
Youpeng Zhao · Yaodong Yang · Zhenbo Lu · Wengang Zhou · Houqiang Li -
2023 Poster: Policy Space Diversity for Non-Transitive Games »
Jian Yao · Weiming Liu · Haobo Fu · Yaodong Yang · Stephen McAleer · Qiang Fu · Wei Yang -
2023 Poster: Hybrid Policy Optimization from Imperfect Demonstrations »
Hanlin Yang · Chao Yu · peng sun · Siji Chen -
2023 Poster: BeaverTails: A Human-Preference Dataset for LLM Harmlessness Alignment »
Jiaming Ji · Mickel Liu · Josef Dai · Xuehai Pan · Chi Zhang · Ce Bian · Boyuan Chen · Ruiyang Sun · Yizhou Wang · Yaodong Yang -
2023 Poster: Safety Gymnasium: A Unified Safe Reinforcement Learning Benchmark »
Jiaming Ji · Borong Zhang · Jiayi Zhou · Xuehai Pan · Weidong Huang · Ruiyang Sun · Yiran Geng · Josef Dai · Yaodong Yang -
2022 Spotlight: Plan To Predict: Learning an Uncertainty-Foreseeing Model For Model-Based Reinforcement Learning »
Zifan Wu · Chao Yu · Chen Chen · Jianye Hao · Hankz Hankui Zhuo -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 Poster: MATE: Benchmarking Multi-Agent Reinforcement Learning in Distributed Target Coverage Control »
Xuehai Pan · Mickel Liu · Fangwei Zhong · Yaodong Yang · Song-Chun Zhu · Yizhou Wang -
2022 Poster: Multi-Agent Reinforcement Learning is a Sequence Modeling Problem »
Muning Wen · Jakub Kuba · Runji Lin · Weinan Zhang · Ying Wen · Jun Wang · Yaodong Yang -
2022 Poster: A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning »
Bo Liu · Xidong Feng · Jie Ren · Luo Mai · Rui Zhu · Haifeng Zhang · Jun Wang · Yaodong Yang -
2021 Poster: Coordinated Proximal Policy Optimization »
Zifan Wu · Chao Yu · Deheng Ye · Junge Zhang · haiyin piao · Hankz Hankui Zhuo -
2018 Poster: Thermostat-assisted continuously-tempered Hamiltonian Monte Carlo for Bayesian learning »
Rui Luo · Jianhong Wang · Yaodong Yang · Jun WANG · Zhanxing Zhu -
2018 Poster: Exponentially Weighted Imitation Learning for Batched Historical Data »
Qing Wang · Jiechao Xiong · Lei Han · peng sun · Han Liu · Tong Zhang -
2017 : Aligned AI Poster Session »
Amanda Askell · Rafal Muszynski · William Wang · Yaodong Yang · Quoc Nguyen · Bryan Kian Hsiang Low · Patrick Jaillet · Candice Schumann · Anqi Liu · Peter Eckersley · Angelina Wang · William Saunders -
2017 : Poster Session »
Shunsuke Horii · Heejin Jeong · Tobias Schwedes · Qing He · Ben Calderhead · Ertunc Erdil · Jaan Altosaar · Patrick Muchmore · Rajiv Khanna · Ian Gemp · Pengfei Zhang · Yuan Zhou · Chris Cremer · Maria DeYoreo · Alexander Terenin · Brendan McVeigh · Rachit Singh · Yaodong Yang · Erik Bodin · Trefor Evans · Henry Chai · Shandian Zhe · Jeffrey Ling · Vincent ADAM · Lars Maaløe · Andrew Miller · Ari Pakman · Josip Djolonga · Hong Ge