Timezone: »
Multi-agent reinforcement learning (MARL) provides a framework for problems involving multiple interacting agents. Despite apparent similarity to the single-agent case, multi-agent problems are often harder to train and analyze theoretically. In this work, we propose MA-Trace, a new on-policy actor-critic algorithm, which extends V-Trace to the MARL setting. The key advantage of our algorithm is its high scalability in a multi-worker setting. To this end, MA-Trace utilizes importance sampling as an off-policy correction method, which allows distributing the computations with no impact on the quality of training. Furthermore, our algorithm is theoretically grounded -- we prove a fixed-point theorem that guarantees convergence. We evaluate the algorithm extensively on the StarCraft Multi-Agent Challenge, a standard benchmark for multi-agent algorithms. MA-Trace achieves high performance on all its tasks and exceeds state-of-the-art results on some of them.
Author Information
Michał Zawalski (University of Warsaw)
Błażej Osiński (University of Warsaw)
Henryk Michalewski (University of Warsaw, Google)
Piotr Miłoś (Polish Academy of Sciences, University of Oxford)
More from the Same Authors
-
2020 : Paper 44: CARLA Real Traffic Scenarios – novel training ground and benchmark for autonomous driving »
Błażej Osiński · Piotr Miłoś · Adam Jakubowski · Christopher Galias · Silviu Homoceanu -
2021 : Continuous Control With Ensemble Deep Deterministic Policy Gradients »
Piotr Januszewski · Mateusz Olko · Michał Królikowski · Jakub Swiatkowski · Marcin Andrychowicz · Łukasz Kuciński · Piotr Miłoś -
2021 Poster: Sparse is Enough in Scaling Transformers »
Sebastian Jaszczur · Aakanksha Chowdhery · Afroz Mohiuddin · LUKASZ KAISER · Wojciech Gajewski · Henryk Michalewski · Jonni Kanerva -
2021 Poster: Subgoal Search For Complex Reasoning Tasks »
Konrad Czechowski · Tomasz Odrzygóźdź · Marek Zbysiński · Michał Zawalski · Krzysztof Olejnik · Yuhuai Wu · Łukasz Kuciński · Piotr Miłoś -
2021 Poster: Catalytic Role Of Noise And Necessity Of Inductive Biases In The Emergence Of Compositional Communication »
Łukasz Kuciński · Tomasz Korbak · Paweł Kołodziej · Piotr Miłoś -
2021 Poster: Continual World: A Robotic Benchmark For Continual Reinforcement Learning »
Maciej Wołczyk · Michał Zając · Razvan Pascanu · Łukasz Kuciński · Piotr Miłoś -
2019 : Coffee + Posters »
Changhao Chen · Nils Gählert · Edouard Leurent · Johannes Lehner · Apratim Bhattacharyya · Harkirat Singh Behl · TeckYian Lim · Shiho Kim · Jelena Novosel · Błażej Osiński · Arindam Das · Ruobing Shen · Jeffrey Hawke · Joachim Sicking · Babak Shahian Jahromi · Theja Tulabandhula · Claudio Michaelis · Evgenia Rusak · WENHANG BAO · Hazem Rashed · JP Chen · Amin Ansari · Jaekwang Cha · Mohamed Zahran · Daniele Reda · Jinhyuk Kim · Kim Dohyun · Ho Suk · Junekyo Jhung · Alexander Kister · Matthias Fahrland · Adam Jakubowski · Piotr Miłoś · Jean Mercat · Bruno Arsenali · Silviu Homoceanu · Xiao-Yang Liu · Philip Torr · Ahmad El Sallab · Ibrahim Sobh · Anurag Arnab · Christopher Galias -
2018 : Coffee Break and Poster Session I »
Pim de Haan · Bin Wang · Dequan Wang · Aadil Hayat · Ibrahim Sobh · Muhammad Asif Rana · Thibault Buhet · Nicholas Rhinehart · Arjun Sharma · Alex Bewley · Michael Kelly · Lionel Blondé · Ozgur S. Oguz · Vaibhav Viswanathan · Jeroen Vanbaar · Konrad Żołna · Negar Rostamzadeh · Rowan McAllister · Sanjay Thakur · Alexandros Kalousis · Chelsea Sidrane · Sujoy Paul · Daphne Chen · Michal Garmulewicz · Henryk Michalewski · Coline Devin · Hongyu Ren · Jiaming Song · Wen Sun · Hanzhang Hu · Wulong Liu · Emilie Wirbel -
2018 Poster: Reinforcement Learning of Theorem Proving »
Cezary Kaliszyk · Josef Urban · Henryk Michalewski · Miroslav Olšák