Timezone: »

The Surprising Effectiveness of PPO in Cooperative Multi-Agent Games
Chao Yu · Akash Velu · Eugene Vinitsky · Jiaxuan Gao · Yu Wang · Alexandre Bayen · YI WU

Wed Nov 30 02:00 PM -- 04:00 PM (PST) @ Hall J #1027

Proximal Policy Optimization (PPO) is a ubiquitous on-policy reinforcement learning algorithm but is significantly less utilized than off-policy learning algorithms in multi-agent settings. This is often due to the belief that PPO is significantly less sample efficient than off-policy methods in multi-agent systems. In this work, we carefully study the performance of PPO in cooperative multi-agent settings. We show that PPO-based multi-agent algorithms achieve surprisingly strong performance in four popular multi-agent testbeds: the particle-world environments, the StarCraft multi-agent challenge, the Hanabi challenge, and Google Research Football, with minimal hyperparameter tuning and without any domain-specific algorithmic modifications or architectures. Importantly, compared to competitive off-policy methods, PPO often achieves competitive or superior results in both final returns and sample efficiency. Finally, through ablation studies, we analyze implementation and hyperparameter factors that are critical to PPO's empirical performance, and give concrete practical suggestions regarding these factors. Our results show that when using these practices, simple PPO-based methods are a strong baseline in cooperative multi-agent reinforcement learning. Source code is released at https://github.com/marlbenchmark/on-policy.

Author Information

Chao Yu (Tsinghua University)
Akash Velu (Stanford University)
Eugene Vinitsky (UC Berkeley)
Jiaxuan Gao (Tsinghua University, Tsinghua University)
Yu Wang (Tsinghua University)

Yu Wang received his B.S. degree in 2002 and Ph.D. degree (with honor) in 2007 from Tsinghua University, Beijing. He is currently a Tenured Associate Professor with the Department of Electronic Engineering, Tsinghua University. His research interests include brain inspired computing, application specific hardware computing, parallel circuit analysis, and power/reliability aware system design methodology. Dr. Wang has authored and coauthored over 150 papers in refereed journals and conferences. He has received Best Paper Award in FPGA 2017, ISVLSI 2012, and Best Poster Award in HEART 2012 with 8 Best Paper Nominations. He is a recipient of IBM X10 Faculty Award in 2010. He served as TPC chair for ICFPT 2011 and Finance Chair of ISLPED 2012-2016, and served as program committee member for leading conferences in these areas, including top EDA conferences such as DAC, DATE, ICCAD, ASP-DAC, and top FPGA conferences such as FPGA and FPT. Currently he serves as Co-EIC for SIGDA E-Newsletter, Associate Editor for IEEE Transactions on CAD and Journal of Circuits, Systems, and Computers. He also serves as guest editor for Integration, the VLSI Journal and IEEE Transactions on Multi-Scale Computing Systems. He is a recipient of NSFC Excellent Young Scholar,and is now serving as ACM distinguished speaker. He is an IEEE/ACM senior member.

Alexandre Bayen (University of California Berkeley)
YI WU (UC Berkeley)

More from the Same Authors