Skip to yearly menu bar Skip to main content

Workshop: Workshop on Machine Learning Safety

How Sure to Be Safe? Difficulty, Confidence and Negative Side Effects

John Burden · José Hernández-Orallo · Sean O hEigeartaigh


A principal concern for AI systems is the occurrence of negative side effects, such as a robot cleaner breaking a vase. This is critical when these systems use machine learning models that were trained to maximise performance, without knowledge or feedback about the negative side effects. Within Vase World and SafeLife, two safety benchmarking domains, we analyse side effects during operation and demonstrate that their magnitude is influenced by task difficulty. Using two forms of confidence measure, we demonstrate that wrapping existing RL agents with these confidence measures enables with safety policies that activate when the agent's confidence falls below a specified threshold extends the Pareto frontier of both performance and safety.

Chat is not available.