Skip to yearly menu bar Skip to main content

Workshop: Deep Reinforcement Learning Workshop

Concept-based Understanding of Emergent Multi-Agent Behavior

Niko Grupen · Shayegan Omidshafiei · Natasha Jaques · Been Kim


This work studies concept-based interpretability in the context of multi-agent learning. Unlike supervised learning, where there have been efforts to understand a model's decisions, multi-agent interpretability remains under-investigated. This is in part due to the increased complexity of the multi-agent setting---interpreting the decisions of multiple agents over time is combinatorially more complex than understanding individual, static decisisons---but is also a reflection of the limited availability of tools for understanding multi-agent behavior. Interactions between agents, and coordination generally, remain difficult to gauge in MARL. In this work, we propose Concept Bottleneck Policies (CBPs) as a method for learning intrinsically interpretable, concept-based policies with MARL. We demonstrate that, by conditioning each agent's action on a set of human-understandable concepts, our method enables post-hoc behavioral analysis via concept intervention that is infeasible with standard policy architectures. Experiments show that concept interventions over CBPs reliably detect when agents have learned to coordinate with each other in environments that do not demand coordination, and detect those environments in which coordination is required. Moreover, we find evidence that CBPs can detect coordination failures (such as lazy agents) and expose the low-level inter-agent information that underpins emergent coordination. Finally, we demonstrate that our approach matches the performance of standard, non-concept-based policies; thereby achieving interpretability without sacrificing performance.

Chat is not available.