Timezone: »

 
Poster
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
Jiayu Chen · Yuanxin Zhang · Yuanfan Xu · Huimin Ma · Huazhong Yang · Jiaming Song · Yu Wang · Yi Wu

Thu Dec 09 12:30 AM -- 02:00 AM (PST) @ Virtual

We introduce an automatic curriculum algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our curriculum learning paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current curriculum, and curriculum update to a new task distribution. Local optimization over the second term suggests that the curriculum should gradually expand the training tasks from easy to hard. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity curriculum, which produces a series of training tasks over both the task configurations as well as the number of entities in the task. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in OpenAI’s hide-and-seek project.

Author Information

Jiayu Chen (Tsinghua University)
Yuanxin Zhang (Tsinghua University, Tsinghua University)
Yuanfan Xu (Tsinghua University, Tsinghua University)
Huimin Ma (Tsinghua University)
Huazhong Yang
Jiaming Song (Stanford University)

I am a first year Ph.D. student in Stanford University. I think about problems in machine learning and deep learning under the supervision of Stefano Ermon. I did my undergrad at Tsinghua University, where I was lucky enough to collaborate with Jun Zhu and Lawrence Carin on scalable Bayesian machine learning.

Yu Wang (Tsinghua University)

Yu Wang received his B.S. degree in 2002 and Ph.D. degree (with honor) in 2007 from Tsinghua University, Beijing. He is currently a Tenured Associate Professor with the Department of Electronic Engineering, Tsinghua University. His research interests include brain inspired computing, application specific hardware computing, parallel circuit analysis, and power/reliability aware system design methodology. Dr. Wang has authored and coauthored over 150 papers in refereed journals and conferences. He has received Best Paper Award in FPGA 2017, ISVLSI 2012, and Best Poster Award in HEART 2012 with 8 Best Paper Nominations. He is a recipient of IBM X10 Faculty Award in 2010. He served as TPC chair for ICFPT 2011 and Finance Chair of ISLPED 2012-2016, and served as program committee member for leading conferences in these areas, including top EDA conferences such as DAC, DATE, ICCAD, ASP-DAC, and top FPGA conferences such as FPGA and FPT. Currently he serves as Co-EIC for SIGDA E-Newsletter, Associate Editor for IEEE Transactions on CAD and Journal of Circuits, Systems, and Computers. He also serves as guest editor for Integration, the VLSI Journal and IEEE Transactions on Multi-Scale Computing Systems. He is a recipient of NSFC Excellent Young Scholar,and is now serving as ACM distinguished speaker. He is an IEEE/ACM senior member.

Yi Wu (OpenAI)

More from the Same Authors