Timezone: »
In reinforcement learning, agents learn by taking actions and observing their outcomes. Sometimes, it is desirable for a human operator to \textit{interrupt} an agent in order to prevent dangerous situations from happening. Yet, as part of their learning process, agents may link these interruptions, that impact their reward, to specific states and deliberately avoid them. The situation is particularly challenging in a multi-agent context because agents might not only learn from their own past interruptions, but also from those of other agents. Orseau and Armstrong~\cite{orseau2016safely} defined \emph{safe interruptibility} for one learner, but their work does not naturally extend to multi-agent systems. This paper introduces \textit{dynamic safe interruptibility}, an alternative definition more suited to decentralized learning problems, and studies this notion in two learning frameworks: \textit{joint action learners} and \textit{independent learners}. We give realistic sufficient conditions on the learning algorithm to enable dynamic safe interruptibility in the case of joint action learners, yet show that these conditions are not sufficient for independent learners. We show however that if agents can detect interruptions, it is possible to prune the observations to ensure dynamic safe interruptibility even for independent learners.
Author Information
El Mahdi El-Mhamdi (EPFL)
Rachid Guerraoui (EPFL)
EPFL Professor
Hadrien Hendrikx (EPFL)
Alexandre Maurer (EPFL)
Related Events (a corresponding poster, oral, or spotlight)
-
2017 Poster: Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning »
Thu. Dec 7th 02:30 -- 06:30 AM Room Pacific Ballroom #204
More from the Same Authors
-
2021 Oral: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Continuized Accelerations of Deterministic and Stochastic Gradient Descents, and of Gossip Algorithms »
Mathieu Even · Raphaël Berthier · Francis Bach · Nicolas Flammarion · Hadrien Hendrikx · Pierre Gaillard · Laurent Massoulié · Adrien Taylor -
2021 Poster: Collaborative Learning in the Jungle (Decentralized, Byzantine, Heterogeneous, Asynchronous and Nonconvex Learning) »
El Mahdi El-Mhamdi · Sadegh Farhadkhani · Rachid Guerraoui · Arsany Guirguis · Lê-Nguyên Hoang · Sébastien Rouault -
2017 : Personalized and Private Peer-to-Peer Machine Learning »
Aurélien Bellet · Rachid Guerraoui · Marc Tommasi -
2017 : Dynamic Safe Interruptibility for Decentralized Multi-Agent Reinforcement Learning »
Hadrien Hendrikx -
2017 Poster: Machine Learning with Adversaries: Byzantine Tolerant Gradient Descent »
Peva Blanchard · El Mahdi El-Mhamdi · Rachid Guerraoui · Julien Stainer