Keynote Talk
in
Workshop: Multi-Agent Security: Security as Key to AI Safety

Multi-Agent Vulnerabilities in Superhuman AI

Adam Gleave

2023 Keynote Talk
in
Workshop: Multi-Agent Security: Security as Key to AI Safety

Abstract

Game-playing systems were among the first AI systems to reach superhuman performance, beating professionals in competitive games like chess and Go. If AIs are robust in any setting, we would expect it to be in such zero-sum games, where performance is almost synonymous with lack of exploitability. However, we recently found that a variety of superhuman Go AIs are vulnerable to a simple adversarial strategy. In this talk, we will outline a threat model for multi-agent adversarial attacks, discuss prior vulnerabilities discovered under this threat model, before diving into vulnerabilities in Go AIs. We will conclude by discussing possible mitigations to improve robustness.

Speaker

Adam Gleave

I am the founder of FAR AI, a non-profit alignment research institute.

Video

Chat is not available.