Timezone: »
Recently, game-playing agents based on AI techniques have demonstrated super-human performance in several sequential games, such as chess, Go, and poker. Surprisingly, the multi-agent learning techniques that allowed to reach these achievements do not take into account the actual behavior of the human player, potentially leading to an impressive gap in performances. In this paper, we address the problem of designing artificial agents that learn how to effectively exploit unknown human opponents while playing repeatedly against them in an online fashion. We study the case in which the agent's strategy during each repetition of the game is subject to constraints ensuring that the human's expected utility is within some lower and upper thresholds. Our framework encompasses several real-world problems, such as human engagement in repeated game playing and human education by means of serious games. As a first result, we formalize a set of linear inequalities encoding the conditions that the agent's strategy must satisfy at each iteration in order to do not violate the given bounds for the human's expected utility. Then, we use such formulation in an upper confidence bound algorithm, and we prove that the resulting procedure suffers from sublinear regret and guarantees that the constraints are satisfied with high probability at each iteration. Finally, we empirically evaluate the convergence of our algorithm on standard testbeds of sequential games.
Author Information
Martino Bernasconi (Politecnico di Milano)
Federico Cacciamani (Politecnico di Milano)
Simone Fioravanti (Gran Sasso Science Institute (GSSI))
Nicola Gatti (Politecnico di Milano)
Alberto Marchesi (Politecnico di Milano)
Francesco Trovò (Politecnico di Milano)
More from the Same Authors
-
2021 : The Evolutionary Dynamics of Soft-Max PolicyGradient in Multi-Agent Settings »
Martino Bernasconi · Federico Cacciamani · Simone Fioravanti · Nicola Gatti · Francesco Trovò -
2021 : Public Information Representation for Adversarial Team Games »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2022 : Multi-Armed Bandit Problem with Temporally-Partitioned Rewards »
Giulia Romano · Andrea Agostini · Francesco Trovò · Nicola Gatti · Marcello Restelli -
2022 : A General Framework for Safe Decision Making: A Convex Duality Approach »
Martino Bernasconi · Federico Cacciamani · Nicola Gatti · Francesco Trovò -
2022 : A Unifying Framework for Online Safe Optimization »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti -
2022 Poster: Sequential Information Design: Learning to Persuade in the Dark »
Martino Bernasconi · Matteo Castiglioni · Alberto Marchesi · Nicola Gatti · Francesco Trovò -
2022 Poster: A Unifying Framework for Online Optimization with Long-Term Constraints »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Giulia Romano · Nicola Gatti -
2022 Poster: Subgame Solving in Adversarial Team Games »
Brian Zhang · Luca Carminati · Federico Cacciamani · Gabriele Farina · Pierriccardo Olivieri · Nicola Gatti · Tuomas Sandholm -
2021 : Spotlight Talk: Public Information Representation for Adversarial Team Games »
Luca Carminati · Federico Cacciamani · Marco Ciccone · Nicola Gatti -
2020 Poster: Online Bayesian Persuasion »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Nicola Gatti -
2020 Poster: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium »
Andrea Celli · Alberto Marchesi · Gabriele Farina · Nicola Gatti -
2020 Spotlight: Online Bayesian Persuasion »
Matteo Castiglioni · Andrea Celli · Alberto Marchesi · Nicola Gatti -
2020 Oral: No-Regret Learning Dynamics for Extensive-Form Correlated Equilibrium »
Andrea Celli · Alberto Marchesi · Gabriele Farina · Nicola Gatti -
2019 Poster: Learning to Correlate in Multi-Player General-Sum Sequential Games »
Andrea Celli · Alberto Marchesi · Tommaso Bianchi · Nicola Gatti -
2018 Poster: Practical exact algorithm for trembling-hand equilibrium refinements in games »
Gabriele Farina · Nicola Gatti · Tuomas Sandholm -
2018 Poster: Ex ante coordination and collusion in zero-sum multi-player extensive-form games »
Gabriele Farina · Andrea Celli · Nicola Gatti · Tuomas Sandholm