Timezone: »
Learning safe solutions is an important but challenging problem in multi-agent reinforcement learning (MARL). Shielded reinforcement learning is one approach for preventing agents from choosing unsafe actions. Current shielded reinforcement learning methods for MARL make strong assumptions about communication and full observability. In this work, we extend the formalization of the shielded reinforcement learning problem to a decentralized multi-agent setting. We then present an algorithm for decomposition of a centralized shield, allowing shields to be used in such decentralized, communication-free environments. Our results show that agents equipped with decentralized shields perform comparably to agents with centralized shields in several tasks, allowing shielding to be used in environments with decentralized training and execution for the first time.
Author Information
Daniel Melcer (Northeastern University)
Christopher Amato (Northeastern University)
Stavros Tripakis (Northeastern University)
More from the Same Authors
-
2022 : Deep Transformer Q-Networks for Partially Observable Reinforcement Learning »
Kevin Esslinger · Robert Platt · Christopher Amato -
2022 Panel: Panel 1B-3: Shield Decentralization for… & DHRL: A Graph-Based… »
Seungjae Lee · Daniel Melcer -
2022 Poster: Asynchronous Actor-Critic for Multi-Agent Reinforcement Learning »
Yuchen Xiao · Weihao Tan · Christopher Amato -
2019 Poster: Reconciling λ-Returns with Experience Replay »
Brett Daley · Christopher Amato