Timezone: »
Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing well-balanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agent-vs-agent and agent-vs-task. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation -- since there is no harm (computational cost aside) from including all available tasks and agents.
Author Information
David Balduzzi (DeepMind)
Karl Tuyls (DeepMind)
Julien Perolat (DeepMind)
Thore Graepel (DeepMind)
More from the Same Authors
-
2021 : Hidden Agenda: a Social Deduction Game with Diverse Learned Equilibria »
Kavya Kopparapu · Edgar Dueñez-Guzman · Jayd Matyas · Alexander Vezhnevets · John Agapiou · Kevin McKee · Richard Everett · Janusz Marecki · Joel Leibo · Thore Graepel -
2022 Poster: Turbocharging Solution Concepts: Solving NEs, CEs and CCEs with Neural Equilibrium Solvers »
Luke Marris · Ian Gemp · Thomas Anthony · Andrea Tacchetti · Siqi Liu · Karl Tuyls -
2020 : Q&A: Open Problems in Cooperative AI with Thore Graepel (DeepMind), Allan Dafoe (University of Oxford), Yoram Bachrach (DeepMind), and Natasha Jaques (Google) [moderator] »
Thore Graepel · Yoram Bachrach · Allan Dafoe · Natasha Jaques -
2020 : Open Problems in Cooperative AI: Thore Graepel (DeepMind) and Allan Dafoe (University of Oxford) »
Thore Graepel · Allan Dafoe -
2020 Workshop: Cooperative AI »
Thore Graepel · Dario Amodei · Vincent Conitzer · Allan Dafoe · Gillian Hadfield · Eric Horvitz · Sarit Kraus · Kate Larson · Yoram Bachrach -
2020 Poster: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Spotlight: Learning to Play No-Press Diplomacy with Best Response Policy Iteration »
Thomas Anthony · Tom Eccles · Andrea Tacchetti · János Kramár · Ian Gemp · Thomas Hudson · Nicolas Porcel · Marc Lanctot · Julien Perolat · Richard Everett · Satinder Singh · Thore Graepel · Yoram Bachrach -
2020 Tutorial: (Track3) Designing Learning Dynamics Q&A »
Marta Garnelo · David Balduzzi · Wojciech Czarnecki -
2020 Poster: Real World Games Look Like Spinning Tops »
Wojciech Czarnecki · Gauthier Gidel · Brendan Tracey · Karl Tuyls · Shayegan Omidshafiei · David Balduzzi · Max Jaderberg -
2020 Poster: Fictitious Play for Mean Field Games: Continuous Time Analysis and Applications »
Sarah Perrin · Julien Perolat · Mathieu Lauriere · Matthieu Geist · Romuald Elie · Olivier Pietquin -
2020 Tutorial: (Track3) Designing Learning Dynamics »
Marta Garnelo · David Balduzzi · Wojciech Czarnecki -
2019 : Invited talk: David Balduzzi (DeepMind »
David Balduzzi -
2019 Poster: Biases for Emergent Communication in Multi-agent Reinforcement Learning »
Tom Eccles · Yoram Bachrach · Guy Lever · Angeliki Lazaridou · Thore Graepel -
2019 Poster: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2019 Spotlight: Multiagent Evaluation under Incomplete Information »
Mark Rowland · Shayegan Omidshafiei · Karl Tuyls · Julien Perolat · Michal Valko · Georgios Piliouras · Remi Munos -
2018 Poster: Actor-Critic Policy Optimization in Partially Observable Multiagent Environments »
Sriram Srinivasan · Marc Lanctot · Vinicius Zambaldi · Julien Perolat · Karl Tuyls · Remi Munos · Michael Bowling -
2018 Poster: Inequity aversion improves cooperation in intertemporal social dilemmas »
Edward Hughes · Joel Leibo · Matthew Phillips · Karl Tuyls · Edgar Dueñez-Guzman · Antonio García Castañeda · Iain Dunning · Tina Zhu · Kevin McKee · Raphael Koster · Heather Roff · Thore Graepel -
2017 Poster: A multi-agent reinforcement learning model of common-pool resource appropriation »
Julien Pérolat · Joel Leibo · Vinicius Zambaldi · Charles Beattie · Karl Tuyls · Thore Graepel -
2017 Poster: A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning »
Marc Lanctot · Vinicius Zambaldi · Audrunas Gruslys · Angeliki Lazaridou · Karl Tuyls · Julien Perolat · David Silver · Thore Graepel -
2016 : Concluding Remarks »
Thore Graepel · Frans Oliehoek · Karl Tuyls -
2016 : Introduction »
Thore Graepel · Karl Tuyls · Frans Oliehoek -
2016 Workshop: Learning, Inference and Control of Multi-Agent Systems »
Thore Graepel · Marc Lanctot · Joel Leibo · Guy Lever · Janusz Marecki · Frans Oliehoek · Karl Tuyls · Vicky Holgate