Timezone: »
When solving two-player zero-sum games, multi-agent reinforcement learning (MARL) algorithms often create populations of agents where, at each iteration, a new agent is discovered as the best response to a mixture over the opponent population. Within such a process, the update rules of "who to compete with" (i.e., the opponent mixture) and "how to beat them" (i.e., finding best responses) are underpinned by manually developed game theoretical principles such as fictitious play and Double Oracle. In this paper, we introduce a novel framework—Neural Auto-Curricula (NAC)—that leverages meta-gradient descent to automate the discovery of the learning update rule without explicit human design. Specifically, we parameterise the opponent selection module by neural networks and the best-response module by optimisation subroutines, and update their parameters solely via interaction with the game engine, where both players aim to minimise their exploitability. Surprisingly, even without human design, the discovered MARL algorithms achieve competitive or even better performance with the state-of-the-art population-based game solvers (e.g., PSRO) on Games of Skill, differentiable Lotto, non-transitive Mixture Games, Iterated Matching Pennies, and Kuhn Poker. Additionally, we show that NAC is able to generalise from small games to large games, for example training on Kuhn Poker and outperforming PSRO on Leduc Poker. Our work inspires a promising future direction to discover general MARL algorithms solely from data.
Author Information
Xidong Feng (University College London)
Oliver Slumbers (University College London)
Ziyu Wan (Shanghai Jiao Tong University)
Bo Liu (Peking University)
Stephen McAleer (UC Irvine)
Ying Wen (UCL)
Jun Wang (University College London)
Yaodong Yang (University College London)
More from the Same Authors
-
2022 Poster: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 : TorchOpt: An Efficient library for Differentiable Optimization »
Jie Ren · Xidong Feng · Bo Liu · Xuehai Pan · Yao Fu · Luo Mai · Yaodong Yang -
2022 : Contextual Transformer for Offline Meta Reinforcement Learning »
Runji Lin · Ye Li · Xidong Feng · Zhaowei Zhang · XIAN HONG WU FUNG · Haifeng Zhang · Jun Wang · Yali Du · Yaodong Yang -
2023 Poster: ChessGPT: Bridging Policy Learning and Language Modeling »
Xidong Feng · Yicheng Luo · Ziyan Wang · Hongrui Tang · Mengyue Yang · Kun Shao · David Mguni · Yali Du · Jun Wang -
2022 Spotlight: Towards Human-Level Bimanual Dexterous Manipulation with Reinforcement Learning »
Yuanpei Chen · Tianhao Wu · Shengjie Wang · Xidong Feng · Jiechuan Jiang · Zongqing Lu · Stephen McAleer · Hao Dong · Song-Chun Zhu · Yaodong Yang -
2022 Poster: EnvPool: A Highly Parallel Reinforcement Learning Environment Execution Engine »
Jiayi Weng · Min Lin · Shengyi Huang · Bo Liu · Denys Makoviichuk · Viktor Makoviychuk · Zichen Liu · Yufan Song · Ting Luo · Yukun Jiang · Zhongwen Xu · Shuicheng Yan -
2022 Poster: A Theoretical Understanding of Gradient Bias in Meta-Reinforcement Learning »
Bo Liu · Xidong Feng · Jie Ren · Luo Mai · Rui Zhu · Haifeng Zhang · Jun Wang · Yaodong Yang -
2021 : Performance-Guaranteed ODE Solvers with Complexity-Informed Neural Networks »
Feng Zhao · Xiang Chen · Jun Wang · Zuoqiang Shi · Shao-Lun Huang -
2021 Poster: Towards Unifying Behavioral and Response Diversity for Open-ended Learning in Zero-sum Games »
Xiangyu Liu · Hangtian Jia · Ying Wen · Yujing Hu · Yingfeng Chen · Changjie Fan · ZHIPENG HU · Yaodong Yang -
2021 Poster: Settling the Variance of Multi-Agent Policy Gradients »
Jakub Grudzien Kuba · Muning Wen · Linghui Meng · shangding gu · Haifeng Zhang · David Mguni · Jun Wang · Yaodong Yang -
2021 Poster: XDO: A Double Oracle Algorithm for Extensive-Form Games »
Stephen McAleer · JB Lanier · Kevin A Wang · Pierre Baldi · Roy Fox -
2020 Poster: Replica-Exchange Nos\'e-Hoover Dynamics for Bayesian Learning on Large Datasets »
Rui Luo · Qiang Zhang · Yaodong Yang · Jun Wang -
2019 : Coffee Break & Poster Session »
Samia Mohinta · Andrea Agostinelli · Alexandra Moringen · Jee Hang Lee · Yat Long Lo · Wolfgang Maass · Blue Sheffer · Colin Bredenberg · Benjamin Eysenbach · Liyu Xia · Efstratios Markou · Jan Lichtenberg · Pierre Richemond · Tony Zhang · JB Lanier · Baihan Lin · William Fedus · Glen Berseth · Marta Sarrico · Matthew Crosby · Stephen McAleer · Sina Ghiassian · Franz Scherr · Guillaume Bellec · Darjan Salaj · Arinbjörn Kolbeinsson · Matthew Rosenberg · Jaehoon Shin · Sang Wan Lee · Guillermo Cecchi · Irina Rish · Elias Hajek -
2018 : Poster Session 1 + Coffee »
Tom Van de Wiele · Rui Zhao · J. Fernando Hernandez-Garcia · Fabio Pardo · Xian Yeow Lee · Xiaolin Andy Li · Marcin Andrychowicz · Jie Tang · Suraj Nair · Juhyeon Lee · Cédric Colas · S. M. Ali Eslami · Yen-Chen Wu · Stephen McAleer · Ryan Julian · Yang Xue · Matthia Sabatelli · Pranav Shyam · Alexandros Kalousis · Giovanni Montana · Emanuele Pesce · Felix Leibfried · Zhanpeng He · Chunxiao Liu · Yanjun Li · Yoshihide Sawada · Alexander Pashevich · Tejas Kulkarni · Keiran Paster · Luca Rigazio · Quan Vuong · Hyunggon Park · Minhae Kwon · Rivindu Weerasekera · Shamane Siriwardhanaa · Rui Wang · Ozsel Kilinc · Keith Ross · Yizhou Wang · Simon Schmitt · Thomas Anthony · Evan Cater · Forest Agostinelli · Tegg Sung · Shirou Maruyama · Alexander Shmakov · Devin Schwab · Mohammad Firouzi · Glen Berseth · Denis Osipychev · Jesse Farebrother · Jianlan Luo · William Agnew · Peter Vrancx · Jonathan Heek · Catalin Ionescu · Haiyan Yin · Megumi Miyashita · Nathan Jay · Noga H. Rotman · Sam Leroux · Shaileshh Bojja Venkatakrishnan · Henri Schmidt · Jack Terwilliger · Ishan Durugkar · Jonathan Sauder · David Kas · Arash Tavakoli · Alain-Sam Cohen · Philip Bontrager · Adam Lerer · Thomas Paine · Ahmed Khalifa · Ruben Rodriguez · Avi Singh · Yiming Zhang -
2017 : Contributed Talks 1 »
Cinjon Resnick · Ying Wen · Stephan Zheng · Mukul Bhutani · Edward Choi