Timezone: »

LIIR: Learning Individual Intrinsic Reward in Multi-Agent Reinforcement Learning
Yali Du · Lei Han · Meng Fang · Ji Liu · Tianhong Dai · Dacheng Tao

Tue Dec 05:30 PM -- 07:30 PM PST @ East Exhibition Hall B + C #198

A great challenge in cooperative decentralized multi-agent reinforcement learning (MARL) is generating diversified behaviors for each individual agent when receiving only a team reward. Prior studies have paid much effort on reward shaping or designing a centralized critic that can discriminatively credit the agents. In this paper, we propose to merge the two directions and learn each agent an intrinsic reward function which diversely stimulates the agents at each time step. Specifically, the intrinsic reward for a specific agent will be involved in computing a distinct proxy critic for the agent to direct the updating of its individual policy. Meanwhile, the parameterized intrinsic reward function will be updated towards maximizing the expected accumulated team reward from the environment so that the objective is consistent with the original MARL problem. The proposed method is referred to as learning individual intrinsic reward (LIIR) in MARL. We compare LIIR with a number of state-of-the-art MARL methods on battle games in StarCraft II. The results demonstrate the effectiveness of LIIR, and we show LIIR can assign each individual agent an insightful intrinsic reward per time step.

Author Information

Yali Du (University College London)

I am currently a research fellow at UCL. I am interested in multi-agent reinforcement learning, adversarial machine learning and recommendation systems.

Lei Han (Tencent AI Lab)
Meng Fang (Tencent)
Ji Liu (Kwai Inc.)
Tianhong Dai (Imperial College London)
Dacheng Tao (University of Sydney)

More from the Same Authors