Timezone: »

The challenge of redundancy on multi agent value factorisation
Siddarth Singh · Benjamin Rosman
Event URL: https://openreview.net/forum?id=OeRx-scrV51 »

Recently there has been great development in the field of multi-agent reinforcement learning (MARL). In the cooperative partially observable multi-agent setting central value functions have been used to perform multi-agent credit assignment for joint global rewards. The standard solution is the use of centralised training and decentralised execution where a central critic conditions the polices of the cooperative agents based on a central observation. Simulated training environments designed using video games typically only contain exactly the amount of agents required to solve the tasks based on preexisting knowledge of the game dynamics and human player solutions. In a more general case, there is likely to be a larger number of agents in an environment than is required to solve the task. These redundant agents reduce overall performance by enlarging the ground truth state if available and increasing the size of the joint policy used to solve the environment. In the case where no ground truth state is available a concatenation of all local observations is used which scales in size with the number of agents and becomes insufficient to condition the centralised critic in large spaces.. We propose leveraging layerwise relevance propagation (LRP) to instead separate the learning of the joint value function and generation of local reward signals and create a new MARL algorithm Relevance decomposition network (RDN). We compare our method to other state of the art (SOTA) MARL algorithms in challenging StarCraft2 and simpler matrix game environments. We show that decomposition algorithms' performance and the usefulness of the state space degrades as the number of redundant agents increases.

Author Information

Siddarth Singh (University of the Witswatersrand)
Benjamin Rosman (University of the Witwatersrand)

More from the Same Authors