Timezone: »

 
PMIC: Improving Multi-Agent Reinforcement Learning with Progressive Mutual Information Collaboration
Pengyi Li · Hongyao Tang · Tianpei Yang · Xiaotian Hao · Sang Tong · YAN ZHENG · Jianye Hao · Matthew Taylor · Jinyi Liu
Event URL: https://openreview.net/forum?id=4cGO_JmTSxo »

Learning to collaborate is critical in multi-agent reinforcement learning (MARL). Several recent works propose to promote collaboration by maximizing the correlation of agents’ behaviors, which is typically characterised by mutual information (MI) in different forms. Generally, high MI signifies a high collaboration level. The correlation of agents’ behavior, typically characterised by mutual information (MI), is an important measure of the agents' collaboration level. Generally, high collaboration level corresponds to high MI. However, simply maximizing the MI of agents’ behaviors cannot guarantee to achieve better collaboration because sub-optimal collaboration can also lead to high MI. To this end, we propose a novel MARL framework, called Progressive Mutual Information Collaboration (PMIC), to facilitate collaboration efficiently and stably. Firstly, we first introduce Dual Progressive Collaboration Buffer (DPCB) which separately stores the superior and inferior samples in a progressive manner. Then we train two MI estimators: one is to maximize the MI associated with superior collaboration to improve agents' policies, the other is to minimize the MI associated with inferior collaboration to prevent from falling into local optimal. Finally, our \alg is general and can be combined with existing MARL algorithms, and experiments on several MARL benchmarks, show the superior performance compared with other MARL algorithms.

Author Information

Pengyi Li (Tianjin University)
Hongyao Tang (Tianjin University)
Tianpei Yang (Tianjin University, University of Alberta)
Xiaotian Hao (Tianjin University)
Sang Tong (university of tianjin of china)
YAN ZHENG (Tianjin University)
Jianye Hao (Tianjin University)
Matthew Taylor (University of Alberta and Amii)
Jinyi Liu (Tianjin University)

More from the Same Authors