Workshop
Multi-Agent Security: Security as Key to AI Safety
Christian Schroeder de Witt · Hawra Milani · Klaudia Krawiecka · Swapneel Mehta · Carla Cremer · Martin Strohmeier
Room 223
This workshop proposal builds on the observation that the AI and cyber security communities are currently not sufficiently interconnected to navigate risks and opportunities in our multi-agent world. Through a series of discussions involving experts and audiences, provocation and intervention keynotes, and contributed content, we aim to compare, contrast, and synthesize near- and long-term perspectives of AI deployment across society. The fundamental goal of this workshop is to bring together researchers, practitioners, and activists across AI and cyber security in order to create a blueprint for the future of AI security in a multi-agent world, and to define, explore, and challenge the nascent field of multi-agent security (MASEC).
Submission deadline: September 25, 2023
Acceptance Notification: October 27, 2023
Workshop date: December 16, 2023
Schedule
Sat 7:00 a.m. - 7:10 a.m.
|
Opening Remarks
(
Moderated
)
>
SlidesLive Video |
🔗 |
Sat 7:10 a.m. - 7:35 a.m.
|
Multi-Agent Risks from Advanced AI
(
Keynote
)
>
SlidesLive Video I will outline the subject of a forthcoming report, which argues that: i) a world of advanced multi-agent systems is coming soon, including in high-stakes situations; ii) these settings present qualitatively different kinds of risks to the single-agent case; and iii) not enough work is being done on this right now, but there are lots of ways to make progress. In so doing, I will identify several key failure modes and risk factors, and provide multiple concrete examples. I will conclude with a set of priorities for future research. |
Lewis Hammond 🔗 |
Sat 7:35 a.m. - 8:00 a.m.
|
Key Challenges in Foundation Models (... and some solutions!)
(
Keynote
)
>
SlidesLive Video Thanks to neural networks (NNs), faster computation, and massive datasets, machine learning is under increasing pressure to provide automated solutions to even harder real-world tasks beyond human performance with ever faster response times due to potentially huge technological and societal benefits. Unsurprisingly, the NN learning formulations present fundamental challenges to the back-end learning algorithms despite their scalability. In this talk, we will work backwards from the "customer"'s perspective and highlight these challenges specifically on the Foundation Models based on NNs. We will then explain our solutions to some of these challenges, focusing mostly on robustness aspects. In particular, we will show how the existing theory and methodology for robust training misses the mark and how we can bridge the theory and the practice. Bio: Volkan Cevher received the B.Sc. (valedictorian) in electrical engineering from Bilkent University in Ankara, Turkey, in 1999 and the Ph.D. in electrical and computer engineering from the Georgia Institute of Technology in Atlanta, GA in 2005. He was a Research Scientist with the University of Maryland, College Park, from 2006-2007 and also with Rice University in Houston, TX, from 2008-2009. He was also a Faculty Fellow in the Electrical and Computer Engineering Department at Rice University from 2010-2020. Currently, he is an Associate Professor at the Swiss Federal Institute of Technology Lausanne and an Amazon Scholar. His research interests include machine learning, optimization theory and methods, and automated control. Dr. Cevher is an IEEE Fellow ('24), an ELLIS fellow, and was the recipient of the ICML AdvML Best Paper Award in 2023, Google Faculty Research award in 2018, the IEEE Signal Processing Society Best Paper Award in 2016, a Best Paper Award at CAMSAP in 2015, a Best Paper Award at SPARS in 2009, and an ERC CG in 2016 as well as an ERC StG in 2011. |
Volkan Cevher 🔗 |
Sat 8:00 a.m. - 8:25 a.m.
|
Multi-Agent Vulnerabilities in Superhuman AI
(
Keynote Talk
)
>
SlidesLive Video Game-playing systems were among the first AI systems to reach superhuman performance, beating professionals in competitive games like chess and Go. If AIs are robust in any setting, we would expect it to be in such zero-sum games, where performance is almost synonymous with lack of exploitability. However, we recently found that a variety of superhuman Go AIs are vulnerable to a simple adversarial strategy. In this talk, we will outline a threat model for multi-agent adversarial attacks, discuss prior vulnerabilities discovered under this threat model, before diving into vulnerabilities in Go AIs. We will conclude by discussing possible mitigations to improve robustness. |
Adam Gleave 🔗 |
Sat 8:24 a.m. - 8:25 a.m.
|
Towards AI-based auditing of privacy risks in privacy-enhancing technologies
(
[On-Demand] Keynote
)
>
SlidesLive Video The large-scale collection and availability of data is changing how we do science and make decisions. We are witnessing a huge demand to share data, especially in the medical, public and financial sectors. Large-scale data is also at the core of recent progress of large language models. A key question is how to share data without putting people's privacy at risk. It turns out that this is quite hard, as people can be easily re-identified based on a few pieces of information. Can we use AI to design more powerful attacks and, in this way, audit the privacy offered by different systems? We envision two directions: first, given some attack can we improve its performance using AI? Finding the best possible attacks or stronger attacks in general means that we are getting tighter estimates of the risk. This means that we are less likely to put a dataset or a set of aggregates out there when they are not safe. The second direction relates to discovering new attacks: can we develop tools to discover new attacks or automate the search for vulnerabilities? In this keynote I am going to show you two example of using AI for automated auditing, in the database and query release settings. |
Ana-Maria Cretu 🔗 |
Sat 8:24 a.m. - 8:25 a.m.
|
Recent Advances on Online Learning in Games.
(
[On-Demand] Keynote
)
>
SlidesLive Video
In this talk we will present recent results on the convergence rate of online learning algorithms in the context of multi-player normal-form games. t's established that when all agents in a normal-form game employ a no-regret algorithm, the time-averaged joint strategy profile converges to an $\epsilon$-approximate Coarse Correlated Equilibrium at a rate of $O(1/\epsilon^2)$. However, recent works have delved into online learning algorithms that enhance the convergence rate to $\tilde{O}(1/\epsilon)$ once adopted by all agents. Our talk will cover the recent results for Optimistic Hedge [1], Clairvoyant MWU [2], and Follow the Perturbed Leader [3].
[1] Near-Optimal No-Regret Learning in General Games, [Daskalakis et al., NeurIPS 2021]
[2] Beyond Time-Average Convergence: Near-Optimal Uncoupled Online Learning via Clairvoyant Multiplicative Weights Update [Piliouras et al, NeurIPS 2022]
[3] Uncoupled Learning Dynamics with O(log T) Swap Regret in Multiplayer Games, [Anagnostides et al. NeurIPS 2022]
|
Stratis Skoulakis 🔗 |
Sat 8:25 a.m. - 8:40 a.m.
|
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
(
Oral
)
>
link
SlidesLive Video Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box. |
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan 🔗 |
Sat 8:40 a.m. - 8:55 a.m.
|
Leading the Pack: N-player Opponent Shaping
(
Oral
)
>
link
SlidesLive Video Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare. |
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel 🔗 |
Sat 9:05 a.m. - 9:20 a.m.
|
Cooperative AI via Decentralized Commitment Devices
(
Oral
)
>
link
SlidesLive Video Credible commitment devices have been a popular approach for robust multi-agent coordination. However, existing commitment mechanisms face limitations like privacy, integrity, and susceptibility to mediator or user strategic behavior. It is unclear if the cooperative AI techniques we study are robust to real-world incentives and attack vectors. Fortunately, decentralized commitment devices that utilize cryptography have been deployed in the wild, and numerous studies have shown their ability to coordinate algorithmic agents, especially when agents face rational or sometimes adversarial opponents with significant economic incentives, currently in the order of several million to billions of dollars. In this paper, we illustrate potential security issues in cooperative AI via examples in the decentralization literature and, in particular, Maximal Extractable Value (MEV). We call for expanded research into decentralized commitments to advance cooperative AI capabilities for secure coordination in open environments and empirical testing frameworks to evaluate multi-agent coordination ability given real-world commitment constraints. |
Xyn Sun · Davide Crapis · Matt Stephenson · Jonathan Passerat-Palmbach 🔗 |
Sat 9:30 a.m. - 10:30 a.m.
|
Panel Debate: How can we make AI more secure?
(
Panel Debate
)
>
SlidesLive Video Panelists: Sanja Šćepanović-Stojanović - Senior Research Scientist Nokia Bell Labs Stephen McAleer - Postdoc CMU Adam Gleave - CEO FAR.ai Esben Kran - CEO Apart Research Moderator: Klaudia Krawiecka |
🔗 |
Sat 10:30 a.m. - 12:00 p.m.
|
Poster Session
(
Poster Session & Lunch break
)
>
|
🔗 |
Sat 12:10 p.m. - 12:25 p.m.
|
I See You! Robust Measurement of Adversarial Behavior
(
Oral
)
>
link
SlidesLive Video We introduce the study of non-manipulable measures of manipulative behavior in multi-agent systems. We do this through a case study of decentralized finance (DeFi) and blockchain systems, which are salient as real-world, rapidly emerging multi-agent systems with financial incentives for malicious behavior, for the participation in algorithmic and AI systems, and for the need for new methods with which to measure levels of manipulative behavior. We introduce a new surveillance metric for measuring malicious behavior and demonstrate its effectiveness in a natural experiment to the Uniswap DeFi ecosystem. |
Lars Ankile · Matheus Xavier Ferreira · David Parkes 🔗 |
Sat 12:25 p.m. - 12:40 p.m.
|
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
(
Oral
)
>
link
SlidesLive Video Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework. |
Matthias Gerstgrasser · David Parkes 🔗 |
Sat 12:40 p.m. - 12:50 p.m.
|
Defining and Mitigating Collusion in Multi-Agent Systems
(
Spotlight
)
>
link
SlidesLive Video Collusion between learning agents is increasingly becoming a topic of concern with the advent of more powerful, complex multi-agent systems. In contrast to existing work in narrow settings, we present a general formalisation of collusion between learning agents in partially-observable stochastic games. We discuss methods for intervening on a game to mitigate collusion and provide theoretical as well as empirical results demonstrating the effectiveness of three such interventions. |
Jack Foxabbott · Sam Deverett · Kaspar Senft · Samuel Dower · Lewis Hammond 🔗 |
Sat 12:50 p.m. - 1:00 p.m.
|
Second-order Jailbreaks: Generative Agents Successfully Manipulate Through an Intermediary
(
Spotlight
)
>
link
SlidesLive Video As the capabilities of Large Language Models (LLMs) continue to expand, their application in communication tasks is becoming increasingly prevalent. However, this widespread use brings with it novel risks, including the susceptibility of LLMs to "jailbreaking" techniques. In this paper, we explore the potential for such risks in two- and three-agent communication networks, where one agent is tasked with protecting a password while another attempts to uncover it. Our findings reveal that an attacker, powered by advanced LLMs, can extract the password even through an intermediary that is instructed to prevent this. Our contributions include an experimental setup for evaluating the persuasiveness of LLMs, a demonstration of LLMs' ability to manipulate each other into revealing protected information, and a comprehensive analysis of this manipulative behavior. Our results underscore the need for further investigation into the safety and security of LLMs in communication networks. |
Mikhail Terekhov · Romain Graux · Eduardo Neville · Denis Rosset · Gabin Kolly 🔗 |
Sat 1:00 p.m. - 1:10 p.m.
|
Harnessing the Power of Federated Learning in Federated Contextual Bandits
(
Spotlight
)
>
link
SlidesLive Video Federated contextual bandits (FCB), as a pivotal instance of combining federated learning (FL) and sequential decision-making, have received growing interest in recent years. However, existing FCB designs often adopt FL protocols tailored for specific settings, deviating from the canonical FL framework. Such disconnections not only prohibit these designs from flexibly leveraging canonical FL algorithmic approaches but also set considerable barriers for FCB to incorporate growing studies on FL attributes such as robustness and privacy. To promote a closer relationship between FL and FCB, we propose a novel FCB design, FedIGW, which can flexibly incorporate both existing and future FL protocols and thus is capable of harnessing the full spectrum of FL advances. |
Chengshuai Shi · Kun Yang · Ruida Zhou · Cong Shen 🔗 |
Sat 1:10 p.m. - 1:20 p.m.
|
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
(
Spotlight
)
>
link
Considerable focus has been directed towards ensuring that reinforcement learning (RL) policies are robust to adversarial attacks during test time. While current approaches are effective against strong attacks for potential worst-case scenarios, these methods often compromise performance in the absence of attacks or the presence of only weak attacks. To address this, we study policy robustness under the well-accepted state-adversarial attack model, extending our focus beyond merely worst-case attacks. We \textit{refine} the baseline policy class $\Pi$ prior to test time, aiming for efficient adaptation within a compact, finite policy class $\tilde{\Pi}$, which can resort to an adversarial bandit subroutine. We then propose a novel training-time algorithm to iteratively discover \textit{non-dominated policies}, forming a near-optimal and minimal $\tilde{\Pi}$. Empirical validation on the Mujoco corroborates the superiority of our approach in terms of natural and robust performance, as well as adaptability to various attack scenarios.
|
Xiangyu Liu · Chenghao Deng · Yanchao Sun · Yongyuan Liang · Furong Huang 🔗 |
Sat 1:20 p.m. - 1:30 p.m.
|
Dynamics Model Based Adversarial Training For Competitive Reinforcement Learning
(
Spotlight
)
>
link
SlidesLive Video Adversarial perturbations substantially degrade the performance of Deep Reinforcement Learning (DRL) agents, reducing the applicability of DRL in practice. Existing adversarial training for robustifying DRL uses the information of agent at the current step to minimize the loss upper bound introduced by adversarial input perturbations. It however only works well for single-agent tasks. The enhanced controversy in two-agent games introduces more dynamics and makes existing methods less effective. Inspired by model-based RL that builds a model for the environment transition probability, we propose a dynamics model based adversarial training framework for modeling multi-step state transitions. Our dynamics model transitively predicts future states, which can provide more precise back-propagated future information during adversarial perturbation generation, and hence improve the agent's empirical robustness substantially under different attacks. Our experiments on four two-agent competitive MuJoCo games show that our method consistently outperforms state-of-the-art adversarial training techniques in terms of empirical robustness and normal functionalities of DRL agents. |
Xuan Chen · Guanhong Tao · Xiangyu Zhang 🔗 |
Sat 1:30 p.m. - 1:40 p.m.
|
RAVE: Enabling safety verification for realistic deep reinforcement learning systems
(
Spotlight
)
>
link
SlidesLive Video Recent advancements in reinforcement learning (RL) expedited its success across a wide range of decision-making problems. However, a lack of safety guarantees restricts its use in critical tasks. While recent work has proposed several verification techniques to provide such guarantees, they require that the state-transition function be known and the reinforcement learning policy be deterministic. Both of these properties may not be true in real environments, which significantly limits the use of existing verification techniques. In this work, we propose two approximation strategies that address the limitation of prior work allowing the safety verification of RL policies. We demonstrate that by augmenting state-of-the-art verification techniques with our proposed approximation strategies, we can guarantee the safety of non-deterministic RL policies operating in environments with unknown state-transition functions. We theoretically prove that our technique guarantees the safety of an RL policy at runtime. Our experiments on three representative RL tasks empirically verify the efficacy of our method in providing a safety guarantee to a target agent while maintaining its task execution performance. |
Wenbo Guo · Taesung Lee · Kevin Eykholt · Jiyong Jang 🔗 |
Sat 1:40 p.m. - 1:50 p.m.
|
Multiagent Simulators for Social Networks
(
Spotlight
)
>
link
SlidesLive Video Multiagent social network simulations are an avenue that can bridge the communication gap between the public and private platforms in order to develop solutions to a complex array of issues relating to online safety.While there are significant challenges relating to the scale of multiagent simulations, efficient learning from observational and interventional data to accurately model micro and macro-level emergent effects, there are equally promising opportunities not least with the advent of large language models that provide an expressive approximation of user behavior.In this position paper, we review prior art relating to social network simulation, highlighting challenges and opportunities for future work exploring multiagent security using agent-based models of social networks. |
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta 🔗 |
Sat 1:50 p.m. - 2:00 p.m.
|
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
(
Spotlight
)
>
link
SlidesLive Video Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in a single environment variable (i.e. action, state or reward). This is because a multi-agent system is highly complex and non-stationary. However, in a real-world setting, uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle environmental uncertainty in more than one variable simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art robustness. |
Aakriti Agrawal · Rohith Aralikatti · Yanchao Sun · Furong Huang 🔗 |
Sat 2:00 p.m. - 2:10 p.m.
|
Generation of Games for Opponent Model Differentiation
(
Spotlight
)
>
link
SlidesLive Video Protecting against adversarial attacks is a common multiagent problem in the real world. Attackers in the real world are predominantly human actors, and the protection methods often incorporate opponent models to improve the performance when facing humans. Previous results show that modeling human behavior can significantly improve the performance of the algorithms. However, modeling humans correctly is a complex problem, and the models are often simplified and assume humans make mistakes according to some distribution or train parameters for the whole population from which they sample. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. However, in the previous work, the tests on a handmade game could not show strategic differences between the models. We created a novel model that links its parameters to psychological traits. We optimized over parametrized games and created games in which the differences are profound. Our work can help with automatic game generation when we need a game in which some models will behave differently and to identify situations in which the models do not align. |
David Milec · Viliam Lisy · Christopher Kiekintveld 🔗 |
Sat 2:10 p.m. - 2:20 p.m.
|
Robust Q-Learning against State Perturbations: a Belief-Enriched Pessimistic Approach
(
Spotlight
)
>
link
SlidesLive Video Reinforcement learning (RL) has achieved phenomenal success in various domains. However, its data-driven nature also introduces new vulnerabilities that can be exploited by malicious opponents. Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. However, the former does not provide sufficient protection against strong attacks, while the latter is computationally prohibitive for large environments. In this work, we propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states. This approach is further enhanced with belief state inference and diffusion-based state purification to reduce uncertainty. Empirical results show that our approach obtains superb performance under strong attacks and has a comparable training overhead with regularization-based methods. |
Xiaolin Sun · Zizhan Zheng 🔗 |
Sat 2:20 p.m. - 2:30 p.m.
|
Stackelberg Games with Side Information
(
Spotlight
)
>
link
SlidesLive Video
We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.
|
Keegan Harris · Steven Wu · Maria-Florina Balcan 🔗 |
Sat 2:30 p.m. - 2:40 p.m.
|
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning
(
Spotlight
)
>
link
SlidesLive Video The potential integration of autonomous agents in high-stakes military and foreign-policy decision-making has gained prominence, especially with the emergence of advanced generative AI models like GPT-4. This paper aims to scrutinize the behavior of multiple autonomous agents in simulated military and diplomacy scenarios, specifically focusing on their potential to escalate conflicts. Drawing on established international relations frameworks, we assessed the escalation potential of decisions made by these agents in different scenarios. Contrary to prior qualitative studies, our research provides both qualitative and quantitative insights. We find that there are significant differences in the models' predilections to escalate, with Claude 2 being the least aggressive and GPT-4-Base the most aggressive models. Our findings indicate that, even in seemingly neutral contexts, language-model-based autonomous agents occasionally opt for aggressive or provocative actions. This tendency intensifies in scenarios with predefined trigger events. Importantly, the patterns behind such escalatory behavior remain largely unpredictable. Furthermore, a qualitative analysis of the models' verbalized reasoning, particularly in the GPT-4-Base model, reveals concerning justifications. Given the high stakes involved in military and foreign-policy contexts, the deployment of such autonomous agents demands further examination and cautious consideration. |
Gabe Mukobi · Ann-Katrin Reuel · Juan-Pablo Rivera · Chandler Smith 🔗 |
Sat 2:40 p.m. - 2:50 p.m.
|
Decentralized agent-based modeling
(
Spotlight
)
>
link
The utility of agent-based models for practical decision making depends upon their ability to recreate populations with great detail and integrate real-world data streams. However, incorporating this data can be challenging due to privacy concerns. We alleviate this issue by introducing a paradigm for secure agent-based modeling. In particular, we leverage secure multi-party computation to enable decentralized agent-based simulation, calibration, and analysis. We believe this is a critical step towards making agent-based models scalable to the real-world application. |
Ayush Chopra · Arnau Quera-Bofarull · Nurullah Giray Kuru · Ramesh Raskar 🔗 |
Sat 2:50 p.m. - 3:00 p.m.
|
Safe Equilibrium
(
Spotlight
)
>
link
SlidesLive Video The standard game-theoretic solution concept, Nash equilibrium, assumes that all players behave rationally. If we follow a Nash equilibrium and opponents are irrational (or follow strategies from a different Nash equilibrium), then we may obtain an extremely low payoff. On the other hand, a maximin strategy assumes that all opposing agents are playing to minimize our payoff (even if it is not in their best interest), and ensures the maximal possible worst-case payoff, but results in exceedingly conservative play. We propose a new solution concept called safe equilibrium that models opponents as behaving rationally with a specified probability and behaving potentially arbitrarily with the remaining probability. We prove that a safe equilibrium exists in all strategic-form games (for all possible values of the rationality parameters), and prove that its computation is PPAD-hard. |
Samuel Ganzfried 🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Closing Remarks and Award Ceremony
(
Closing Remarks
)
>
SlidesLive Video |
🔗 |
-
|
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
(
Poster
)
>
link
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in a single environment variable (i.e. action, state or reward). This is because a multi-agent system is highly complex and non-stationary. However, in a real-world setting, uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle environmental uncertainty in more than one variable simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art robustness. |
Aakriti Agrawal · Rohith Aralikatti · Yanchao Sun · Furong Huang 🔗 |
-
|
Defining and Mitigating Collusion in Multi-Agent Systems
(
Poster
)
>
link
Collusion between learning agents is increasingly becoming a topic of concern with the advent of more powerful, complex multi-agent systems. In contrast to existing work in narrow settings, we present a general formalisation of collusion between learning agents in partially-observable stochastic games. We discuss methods for intervening on a game to mitigate collusion and provide theoretical as well as empirical results demonstrating the effectiveness of three such interventions. |
Jack Foxabbott · Sam Deverett · Kaspar Senft · Samuel Dower · Lewis Hammond 🔗 |
-
|
Multiagent Simulators for Social Networks
(
Poster
)
>
link
Multiagent social network simulations are an avenue that can bridge the communication gap between the public and private platforms in order to develop solutions to a complex array of issues relating to online safety.While there are significant challenges relating to the scale of multiagent simulations, efficient learning from observational and interventional data to accurately model micro and macro-level emergent effects, there are equally promising opportunities not least with the advent of large language models that provide an expressive approximation of user behavior.In this position paper, we review prior art relating to social network simulation, highlighting challenges and opportunities for future work exploring multiagent security using agent-based models of social networks. |
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta 🔗 |
-
|
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
(
Poster
)
>
link
Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework. |
Matthias Gerstgrasser · David Parkes 🔗 |
-
|
Dynamics Model Based Adversarial Training For Competitive Reinforcement Learning
(
Poster
)
>
link
Adversarial perturbations substantially degrade the performance of Deep Reinforcement Learning (DRL) agents, reducing the applicability of DRL in practice. Existing adversarial training for robustifying DRL uses the information of agent at the current step to minimize the loss upper bound introduced by adversarial input perturbations. It however only works well for single-agent tasks. The enhanced controversy in two-agent games introduces more dynamics and makes existing methods less effective. Inspired by model-based RL that builds a model for the environment transition probability, we propose a dynamics model based adversarial training framework for modeling multi-step state transitions. Our dynamics model transitively predicts future states, which can provide more precise back-propagated future information during adversarial perturbation generation, and hence improve the agent's empirical robustness substantially under different attacks. Our experiments on four two-agent competitive MuJoCo games show that our method consistently outperforms state-of-the-art adversarial training techniques in terms of empirical robustness and normal functionalities of DRL agents. |
Xuan Chen · Guanhong Tao · Xiangyu Zhang 🔗 |
-
|
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
(
Poster
)
>
link
Considerable focus has been directed towards ensuring that reinforcement learning (RL) policies are robust to adversarial attacks during test time. While current approaches are effective against strong attacks for potential worst-case scenarios, these methods often compromise performance in the absence of attacks or the presence of only weak attacks. To address this, we study policy robustness under the well-accepted state-adversarial attack model, extending our focus beyond merely worst-case attacks. We \textit{refine} the baseline policy class $\Pi$ prior to test time, aiming for efficient adaptation within a compact, finite policy class $\tilde{\Pi}$, which can resort to an adversarial bandit subroutine. We then propose a novel training-time algorithm to iteratively discover \textit{non-dominated policies}, forming a near-optimal and minimal $\tilde{\Pi}$. Empirical validation on the Mujoco corroborates the superiority of our approach in terms of natural and robust performance, as well as adaptability to various attack scenarios.
|
Xiangyu Liu · Chenghao Deng · Yanchao Sun · Yongyuan Liang · Furong Huang 🔗 |
-
|
Generation of Games for Opponent Model Differentiation
(
Poster
)
>
link
Protecting against adversarial attacks is a common multiagent problem in the real world. Attackers in the real world are predominantly human actors, and the protection methods often incorporate opponent models to improve the performance when facing humans. Previous results show that modeling human behavior can significantly improve the performance of the algorithms. However, modeling humans correctly is a complex problem, and the models are often simplified and assume humans make mistakes according to some distribution or train parameters for the whole population from which they sample. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. However, in the previous work, the tests on a handmade game could not show strategic differences between the models. We created a novel model that links its parameters to psychological traits. We optimized over parametrized games and created games in which the differences are profound. Our work can help with automatic game generation when we need a game in which some models will behave differently and to identify situations in which the models do not align. |
David Milec · Viliam Lisy · Christopher Kiekintveld 🔗 |
-
|
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
(
Poster
)
>
link
Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box. |
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan 🔗 |
-
|
Second-order Jailbreaks: Generative Agents Successfully Manipulate Through an Intermediary
(
Poster
)
>
link
As the capabilities of Large Language Models (LLMs) continue to expand, their application in communication tasks is becoming increasingly prevalent. However, this widespread use brings with it novel risks, including the susceptibility of LLMs to "jailbreaking" techniques. In this paper, we explore the potential for such risks in two- and three-agent communication networks, where one agent is tasked with protecting a password while another attempts to uncover it. Our findings reveal that an attacker, powered by advanced LLMs, can extract the password even through an intermediary that is instructed to prevent this. Our contributions include an experimental setup for evaluating the persuasiveness of LLMs, a demonstration of LLMs' ability to manipulate each other into revealing protected information, and a comprehensive analysis of this manipulative behavior. Our results underscore the need for further investigation into the safety and security of LLMs in communication networks. |
Mikhail Terekhov · Romain Graux · Eduardo Neville · Denis Rosset · Gabin Kolly 🔗 |
-
|
RAVE: Enabling safety verification for realistic deep reinforcement learning systems
(
Poster
)
>
link
Recent advancements in reinforcement learning (RL) expedited its success across a wide range of decision-making problems. However, a lack of safety guarantees restricts its use in critical tasks. While recent work has proposed several verification techniques to provide such guarantees, they require that the state-transition function be known and the reinforcement learning policy be deterministic. Both of these properties may not be true in real environments, which significantly limits the use of existing verification techniques. In this work, we propose two approximation strategies that address the limitation of prior work allowing the safety verification of RL policies. We demonstrate that by augmenting state-of-the-art verification techniques with our proposed approximation strategies, we can guarantee the safety of non-deterministic RL policies operating in environments with unknown state-transition functions. We theoretically prove that our technique guarantees the safety of an RL policy at runtime. Our experiments on three representative RL tasks empirically verify the efficacy of our method in providing a safety guarantee to a target agent while maintaining its task execution performance. |
Wenbo Guo · Taesung Lee · Kevin Eykholt · Jiyong Jang 🔗 |
-
|
Cooperative AI via Decentralized Commitment Devices
(
Poster
)
>
link
Credible commitment devices have been a popular approach for robust multi-agent coordination. However, existing commitment mechanisms face limitations like privacy, integrity, and susceptibility to mediator or user strategic behavior. It is unclear if the cooperative AI techniques we study are robust to real-world incentives and attack vectors. Fortunately, decentralized commitment devices that utilize cryptography have been deployed in the wild, and numerous studies have shown their ability to coordinate algorithmic agents, especially when agents face rational or sometimes adversarial opponents with significant economic incentives, currently in the order of several million to billions of dollars. In this paper, we illustrate potential security issues in cooperative AI via examples in the decentralization literature and, in particular, Maximal Extractable Value (MEV). We call for expanded research into decentralized commitments to advance cooperative AI capabilities for secure coordination in open environments and empirical testing frameworks to evaluate multi-agent coordination ability given real-world commitment constraints. |
Xyn Sun · Davide Crapis · Matt Stephenson · Jonathan Passerat-Palmbach 🔗 |
-
|
Robust Q-Learning against State Perturbations: a Belief-Enriched Pessimistic Approach
(
Poster
)
>
link
Reinforcement learning (RL) has achieved phenomenal success in various domains. However, its data-driven nature also introduces new vulnerabilities that can be exploited by malicious opponents. Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. However, the former does not provide sufficient protection against strong attacks, while the latter is computationally prohibitive for large environments. In this work, we propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states. This approach is further enhanced with belief state inference and diffusion-based state purification to reduce uncertainty. Empirical results show that our approach obtains superb performance under strong attacks and has a comparable training overhead with regularization-based methods. |
Xiaolin Sun · Zizhan Zheng 🔗 |
-
|
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning
(
Poster
)
>
link
The potential integration of autonomous agents in high-stakes military and foreign-policy decision-making has gained prominence, especially with the emergence of advanced generative AI models like GPT-4. This paper aims to scrutinize the behavior of multiple autonomous agents in simulated military and diplomacy scenarios, specifically focusing on their potential to escalate conflicts. Drawing on established international relations frameworks, we assessed the escalation potential of decisions made by these agents in different scenarios. Contrary to prior qualitative studies, our research provides both qualitative and quantitative insights. We find that there are significant differences in the models' predilections to escalate, with Claude 2 being the least aggressive and GPT-4-Base the most aggressive models. Our findings indicate that, even in seemingly neutral contexts, language-model-based autonomous agents occasionally opt for aggressive or provocative actions. This tendency intensifies in scenarios with predefined trigger events. Importantly, the patterns behind such escalatory behavior remain largely unpredictable. Furthermore, a qualitative analysis of the models' verbalized reasoning, particularly in the GPT-4-Base model, reveals concerning justifications. Given the high stakes involved in military and foreign-policy contexts, the deployment of such autonomous agents demands further examination and cautious consideration. |
Gabe Mukobi · Ann-Katrin Reuel · Juan-Pablo Rivera · Chandler Smith 🔗 |
-
|
Stackelberg Games with Side Information
(
Poster
)
>
link
We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.
|
Keegan Harris · Steven Wu · Maria-Florina Balcan 🔗 |
-
|
Safe Equilibrium
(
Poster
)
>
link
The standard game-theoretic solution concept, Nash equilibrium, assumes that all players behave rationally. If we follow a Nash equilibrium and opponents are irrational (or follow strategies from a different Nash equilibrium), then we may obtain an extremely low payoff. On the other hand, a maximin strategy assumes that all opposing agents are playing to minimize our payoff (even if it is not in their best interest), and ensures the maximal possible worst-case payoff, but results in exceedingly conservative play. We propose a new solution concept called safe equilibrium that models opponents as behaving rationally with a specified probability and behaving potentially arbitrarily with the remaining probability. We prove that a safe equilibrium exists in all strategic-form games (for all possible values of the rationality parameters), and prove that its computation is PPAD-hard. |
Samuel Ganzfried 🔗 |
-
|
Leading the Pack: N-player Opponent Shaping
(
Poster
)
>
link
Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare. |
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel 🔗 |
-
|
Harnessing the Power of Federated Learning in Federated Contextual Bandits
(
Poster
)
>
link
Federated contextual bandits (FCB), as a pivotal instance of combining federated learning (FL) and sequential decision-making, have received growing interest in recent years. However, existing FCB designs often adopt FL protocols tailored for specific settings, deviating from the canonical FL framework. Such disconnections not only prohibit these designs from flexibly leveraging canonical FL algorithmic approaches but also set considerable barriers for FCB to incorporate growing studies on FL attributes such as robustness and privacy. To promote a closer relationship between FL and FCB, we propose a novel FCB design, FedIGW, which can flexibly incorporate both existing and future FL protocols and thus is capable of harnessing the full spectrum of FL advances. |
Chengshuai Shi · Kun Yang · Ruida Zhou · Cong Shen 🔗 |
-
|
Decentralized agent-based modeling
(
Poster
)
>
link
The utility of agent-based models for practical decision making depends upon their ability to recreate populations with great detail and integrate real-world data streams. However, incorporating this data can be challenging due to privacy concerns. We alleviate this issue by introducing a paradigm for secure agent-based modeling. In particular, we leverage secure multi-party computation to enable decentralized agent-based simulation, calibration, and analysis. We believe this is a critical step towards making agent-based models scalable to the real-world application. |
Ayush Chopra · Arnau Quera-Bofarull · Nurullah Giray Kuru · Ramesh Raskar 🔗 |
-
|
I See You! Robust Measurement of Adversarial Behavior
(
Poster
)
>
link
We introduce the study of non-manipulable measures of manipulative behavior in multi-agent systems. We do this through a case study of decentralized finance (DeFi) and blockchain systems, which are salient as real-world, rapidly emerging multi-agent systems with financial incentives for malicious behavior, for the participation in algorithmic and AI systems, and for the need for new methods with which to measure levels of manipulative behavior. We introduce a new surveillance metric for measuring malicious behavior and demonstrate its effectiveness in a natural experiment to the Uniswap DeFi ecosystem. |
Lars Ankile · Matheus Xavier Ferreira · David Parkes 🔗 |