Workshop: Causal Inference Challenges in Sequential Decision Making: Bridging Theory and Practice

MAGNET: Multi-Agent Graph Cooperative Bandits

Hengrui Cai · Rui Song

Abstract: We consider the online optimization for multiple interactive agents who cooperate to achieve a global best reward. By interaction, one agent's action and reward will influence other agents' rewards. Yet, the existing cooperative bandits primarily assume that agents are independent of each other. The multi-agent reinforcement learning methods, on the other hand, mainly rely on deep learning, and thus hard to interpret or achieve a theoretical guarantee. In this work, we leverage the idea from the structural equation model to characterize different interactions among agents simultaneously, and propose the multi-agent graph cooperative bandits (MAGNET) that integrates techniques from online decision making, graph structure modeling, and multi-agent system. We allow heterogeneous individual rewards in the cooperative agents, and the interaction information partially known or completely unknown. Depending on the scenario if the global objective is known or not, the framework consists of a global optimizer for a central controller, and a local optimizer for decentralized agents, respectively. We derive the regret bound for the global optimizer as $\mathcal{O}( \sqrt{KT})$ and for the local optimizer as $\mathcal{O}(K \sqrt{T})$. Extensive simulation studies show the empirical validity of the proposed methods.

Chat is not available.