Timezone: »
This workshop proposal builds on the observation that the AI and cyber security communities are currently not sufficiently interconnected to navigate risks and opportunities in our multi-agent world. Through a series of discussions involving experts and audiences, provocation and intervention keynotes, and contributed content, we aim to compare, contrast, and synthesize near- and long-term perspectives of AI deployment across society. The fundamental goal of this workshop is to bring together researchers, practitioners, and activists across AI and cyber security in order to create a blueprint for the future of AI security in a multi-agent world, and to define, explore, and challenge the nascent field of multi-agent security (MASEC).
Submission deadline: September 25, 2023
Acceptance Notification: October 27, 2023
Workshop date: December 16, 2023
Sat 7:00 a.m. - 7:10 a.m.
|
Opening Remarks
(
Moderated
)
|
🔗 |
Sat 7:35 a.m. - 8:00 a.m.
|
TBA
(
Keynote
)
TBA |
Volkan Cevher 🔗 |
Sat 9:30 a.m. - 10:30 a.m.
|
Panel Debate
|
🔗 |
Sat 10:30 a.m. - 12:00 p.m.
|
Poster Session
|
🔗 |
Sat 3:00 p.m. - 3:30 p.m.
|
Closing Remarks and Award Ceremony
(
Closing Remarks
)
|
🔗 |
-
|
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
(
Poster
)
link »
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in a single environment variable (i.e. action, state or reward). This is because a multi-agent system is highly complex and non-stationary. However, in a real-world setting, uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle environmental uncertainty in more than one variable simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art robustness. |
Aakriti Agrawal · Rohith Aralikatti · Yanchao Sun · Furong Huang 🔗 |
-
|
Defining and Mitigating Collusion in Multi-Agent Systems
(
Poster
)
link »
Collusion between learning agents is increasingly becoming a topic of concern with the advent of more powerful, complex multi-agent systems. In contrast to existing work in narrow settings, we present a general formalisation of collusion between learning agents in partially-observable stochastic games. We discuss methods for intervening on a game to mitigate collusion and provide theoretical as well as empirical results demonstrating the effectiveness of three such interventions. |
Jack Foxabbott · Sam Deverett · Kaspar Senft · Samuel Dower · Lewis Hammond 🔗 |
-
|
Multiagent Simulators for Social Networks
(
Poster
)
link »
Multiagent social network simulations are an avenue that can bridge the communication gap between the public and private platforms in order to develop solutions to a complex array of issues relating to online safety.While there are significant challenges relating to the scale of multiagent simulations, efficient learning from observational and interventional data to accurately model micro and macro-level emergent effects, there are equally promising opportunities not least with the advent of large language models that provide an expressive approximation of user behavior.In this position paper, we review prior art relating to social network simulation, highlighting challenges and opportunities for future work exploring multiagent security using agent-based models of social networks. |
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta 🔗 |
-
|
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
(
Poster
)
link »
Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework. |
Matthias Gerstgrasser · David Parkes 🔗 |
-
|
Dynamics Model Based Adversarial Training For Competitive Reinforcement Learning
(
Poster
)
link »
Adversarial perturbations substantially degrade the performance of Deep Reinforcement Learning (DRL) agents, reducing the applicability of DRL in practice. Existing adversarial training for robustifying DRL uses the information of agent at the current step to minimize the loss upper bound introduced by adversarial input perturbations. It however only works well for single-agent tasks. The enhanced controversy in two-agent games introduces more dynamics and makes existing methods less effective. Inspired by model-based RL that builds a model for the environment transition probability, we propose a dynamics model based adversarial training framework for modeling multi-step state transitions. Our dynamics model transitively predicts future states, which can provide more precise back-propagated future information during adversarial perturbation generation, and hence improve the agent's empirical robustness substantially under different attacks. Our experiments on four two-agent competitive MuJoCo games show that our method consistently outperforms state-of-the-art adversarial training techniques in terms of empirical robustness and normal functionalities of DRL agents. |
Xuan Chen · Guanhong Tao · Xiangyu Zhang 🔗 |
-
|
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
(
Poster
)
link »
Considerable focus has been directed towards ensuring that reinforcement learning (RL) policies are robust to adversarial attacks during test time. While current approaches are effective against strong attacks for potential worst-case scenarios, these methods often compromise performance in the absence of attacks or the presence of only weak attacks. To address this, we study policy robustness under the well-accepted state-adversarial attack model, extending our focus beyond merely worst-case attacks. We \textit{refine} the baseline policy class $\Pi$ prior to test time, aiming for efficient adaptation within a compact, finite policy class $\tilde{\Pi}$, which can resort to an adversarial bandit subroutine. We then propose a novel training-time algorithm to iteratively discover \textit{non-dominated policies}, forming a near-optimal and minimal $\tilde{\Pi}$. Empirical validation on the Mujoco corroborates the superiority of our approach in terms of natural and robust performance, as well as adaptability to various attack scenarios.
|
Xiangyu Liu · Chenghao Deng · Yanchao Sun · Yongyuan Liang · Furong Huang 🔗 |
-
|
Generation of Games for Opponent Model Differentiation
(
Poster
)
link »
Protecting against adversarial attacks is a common multiagent problem in the real world. Attackers in the real world are predominantly human actors, and the protection methods often incorporate opponent models to improve the performance when facing humans. Previous results show that modeling human behavior can significantly improve the performance of the algorithms. However, modeling humans correctly is a complex problem, and the models are often simplified and assume humans make mistakes according to some distribution or train parameters for the whole population from which they sample. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. However, in the previous work, the tests on a handmade game could not show strategic differences between the models. We created a novel model that links its parameters to psychological traits. We optimized over parametrized games and created games in which the differences are profound. Our work can help with automatic game generation when we need a game in which some models will behave differently and to identify situations in which the models do not align. |
David Milec · Viliam Lisy · Christopher Kiekintveld 🔗 |
-
|
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
(
Poster
)
link »
Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box. |
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan 🔗 |
-
|
Second-order Jailbreaks: Generative Agents Successfully Manipulate Through an Intermediary
(
Poster
)
link »
As the capabilities of Large Language Models (LLMs) continue to expand, their application in communication tasks is becoming increasingly prevalent. However, this widespread use brings with it novel risks, including the susceptibility of LLMs to "jailbreaking" techniques. In this paper, we explore the potential for such risks in two- and three-agent communication networks, where one agent is tasked with protecting a password while another attempts to uncover it. Our findings reveal that an attacker, powered by advanced LLMs, can extract the password even through an intermediary that is instructed to prevent this. Our contributions include an experimental setup for evaluating the persuasiveness of LLMs, a demonstration of LLMs' ability to manipulate each other into revealing protected information, and a comprehensive analysis of this manipulative behavior. Our results underscore the need for further investigation into the safety and security of LLMs in communication networks. |
Mikhail Terekhov · Romain Graux · Eduardo Neville · Denis Rosset · Gabin Kolly 🔗 |
-
|
RAVE: Enabling safety verification for realistic deep reinforcement learning systems
(
Poster
)
link »
Recent advancements in reinforcement learning (RL) expedited its success across a wide range of decision-making problems. However, a lack of safety guarantees restricts its use in critical tasks. While recent work has proposed several verification techniques to provide such guarantees, they require that the state-transition function be known and the reinforcement learning policy be deterministic. Both of these properties may not be true in real environments, which significantly limits the use of existing verification techniques. In this work, we propose two approximation strategies that address the limitation of prior work allowing the safety verification of RL policies. We demonstrate that by augmenting state-of-the-art verification techniques with our proposed approximation strategies, we can guarantee the safety of non-deterministic RL policies operating in environments with unknown state-transition functions. We theoretically prove that our technique guarantees the safety of an RL policy at runtime. Our experiments on three representative RL tasks empirically verify the efficacy of our method in providing a safety guarantee to a target agent while maintaining its task execution performance. |
Wenbo Guo · Taesung Lee · Kevin Eykholt · Jiyong Jang 🔗 |
-
|
Cooperative AI via Decentralized Commitment Devices
(
Poster
)
link »
Credible commitment devices have been a popular approach for robust multi-agent coordination. However, existing commitment mechanisms face limitations like privacy, integrity, and susceptibility to mediator or user strategic behavior. It is unclear if the cooperative AI techniques we study are robust to real-world incentives and attack vectors. Fortunately, decentralized commitment devices that utilize cryptography have been deployed in the wild, and numerous studies have shown their ability to coordinate algorithmic agents, especially when agents face rational or sometimes adversarial opponents with significant economic incentives, currently in the order of several million to billions of dollars. In this paper, we illustrate potential security issues in cooperative AI via examples in the decentralization literature and, in particular, Maximal Extractable Value (MEV). We call for expanded research into decentralized commitments to advance cooperative AI capabilities for secure coordination in open environments and empirical testing frameworks to evaluate multi-agent coordination ability given real-world commitment constraints. |
Xyn Sun · Davide Crapis · Matt Stephenson · Jonathan Passerat-Palmbach 🔗 |
-
|
Robust Q-Learning against State Perturbations: a Belief-Enriched Pessimistic Approach
(
Poster
)
link »
Reinforcement learning (RL) has achieved phenomenal success in various domains. However, its data-driven nature also introduces new vulnerabilities that can be exploited by malicious opponents. Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. However, the former does not provide sufficient protection against strong attacks, while the latter is computationally prohibitive for large environments. In this work, we propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states. This approach is further enhanced with belief state inference and diffusion-based state purification to reduce uncertainty. Empirical results show that our approach obtains superb performance under strong attacks and has a comparable training overhead with regularization-based methods. |
Xiaolin Sun · Zizhan Zheng 🔗 |
-
|
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning
(
Poster
)
link »
The potential integration of autonomous agents in high-stakes military and foreign-policy decision-making has gained prominence, especially with the emergence of advanced generative AI models like GPT-4. This paper aims to scrutinize the behavior of multiple autonomous agents in simulated military and diplomacy scenarios, specifically focusing on their potential to escalate conflicts. Drawing on established international relations frameworks, we assessed the escalation potential of decisions made by these agents in different scenarios. Contrary to prior qualitative studies, our research provides both qualitative and quantitative insights. We find that there are significant differences in the models' predilections to escalate, with Claude 2 being the least aggressive and GPT-4-Base the most aggressive models. Our findings indicate that, even in seemingly neutral contexts, language-model-based autonomous agents occasionally opt for aggressive or provocative actions. This tendency intensifies in scenarios with predefined trigger events. Importantly, the patterns behind such escalatory behavior remain largely unpredictable. Furthermore, a qualitative analysis of the models' verbalized reasoning, particularly in the GPT-4-Base model, reveals concerning justifications. Given the high stakes involved in military and foreign-policy contexts, the deployment of such autonomous agents demands further examination and cautious consideration. |
Gabe Mukobi · Ann-Katrin Reuel · Juan-Pablo Rivera · Chandler Smith 🔗 |
-
|
Stackelberg Games with Side Information
(
Poster
)
link »
We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.
|
Keegan Harris · Steven Wu · Maria-Florina Balcan 🔗 |
-
|
Safe Equilibrium
(
Poster
)
link »
The standard game-theoretic solution concept, Nash equilibrium, assumes that all players behave rationally. If we follow a Nash equilibrium and opponents are irrational (or follow strategies from a different Nash equilibrium), then we may obtain an extremely low payoff. On the other hand, a maximin strategy assumes that all opposing agents are playing to minimize our payoff (even if it is not in their best interest), and ensures the maximal possible worst-case payoff, but results in exceedingly conservative play. We propose a new solution concept called safe equilibrium that models opponents as behaving rationally with a specified probability and behaving potentially arbitrarily with the remaining probability. We prove that a safe equilibrium exists in all strategic-form games (for all possible values of the rationality parameters), and prove that its computation is PPAD-hard. |
Samuel Ganzfried 🔗 |
-
|
Leading the Pack: N-player Opponent Shaping
(
Poster
)
link »
Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare. |
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel 🔗 |
-
|
Harnessing the Power of Federated Learning in Federated Contextual Bandits
(
Poster
)
link »
Federated contextual bandits (FCB), as a pivotal instance of combining federated learning (FL) and sequential decision-making, have received growing interest in recent years. However, existing FCB designs often adopt FL protocols tailored for specific settings, deviating from the canonical FL framework. Such disconnections not only prohibit these designs from flexibly leveraging canonical FL algorithmic approaches but also set considerable barriers for FCB to incorporate growing studies on FL attributes such as robustness and privacy. To promote a closer relationship between FL and FCB, we propose a novel FCB design, FedIGW, which can flexibly incorporate both existing and future FL protocols and thus is capable of harnessing the full spectrum of FL advances. |
Chengshuai Shi · Kun Yang · Ruida Zhou · Cong Shen 🔗 |
-
|
Decentralized agent-based modeling
(
Poster
)
link »
The utility of agent-based models for practical decision making depends upon their ability to recreate populations with great detail and integrate real-world data streams. However, incorporating this data can be challenging due to privacy concerns. We alleviate this issue by introducing a paradigm for secure agent-based modeling. In particular, we leverage secure multi-party computation to enable decentralized agent-based simulation, calibration, and analysis. We believe this is a critical step towards making agent-based models scalable to the real-world application. |
Ayush Chopra · Arnau Quera-Bofarull · Nurullah Giray Kuru · Ramesh Raskar 🔗 |
-
|
I See You! Robust Measurement of Adversarial Behavior
(
Poster
)
link »
We introduce the study of non-manipulable measures of manipulative behavior in multi-agent systems. We do this through a case study of decentralized finance (DeFi) and blockchain systems, which are salient as real-world, rapidly emerging multi-agent systems with financial incentives for malicious behavior, for the participation in algorithmic and AI systems, and for the need for new methods with which to measure levels of manipulative behavior. We introduce a new surveillance metric for measuring malicious behavior and demonstrate its effectiveness in a natural experiment to the Uniswap DeFi ecosystem. |
Lars Ankile · Matheus Xavier Ferreira · David Parkes 🔗 |
-
|
Robustness to Multi-Modal Environment Uncertainty in MARL using Curriculum Learning
(
Spotlight
)
link »
Multi-agent reinforcement learning (MARL) plays a pivotal role in tackling real-world challenges. However, the seamless transition of trained policies from simulations to real-world requires it to be robust to various environmental uncertainties. Existing works focus on finding Nash Equilibrium or the optimal policy under uncertainty in a single environment variable (i.e. action, state or reward). This is because a multi-agent system is highly complex and non-stationary. However, in a real-world setting, uncertainty can occur in multiple environment variables simultaneously. This work is the first to formulate the generalised problem of robustness to multi-modal environment uncertainty in MARL. To this end, we propose a general robust training approach for multi-modal uncertainty based on curriculum learning techniques. We handle environmental uncertainty in more than one variable simultaneously and present extensive results across both cooperative and competitive MARL environments, demonstrating that our approach achieves state-of-the-art robustness. |
Aakriti Agrawal · Rohith Aralikatti · Yanchao Sun · Furong Huang 🔗 |
-
|
Defining and Mitigating Collusion in Multi-Agent Systems
(
Spotlight
)
link »
Collusion between learning agents is increasingly becoming a topic of concern with the advent of more powerful, complex multi-agent systems. In contrast to existing work in narrow settings, we present a general formalisation of collusion between learning agents in partially-observable stochastic games. We discuss methods for intervening on a game to mitigate collusion and provide theoretical as well as empirical results demonstrating the effectiveness of three such interventions. |
Jack Foxabbott · Sam Deverett · Kaspar Senft · Samuel Dower · Lewis Hammond 🔗 |
-
|
Multiagent Simulators for Social Networks
(
Spotlight
)
link »
Multiagent social network simulations are an avenue that can bridge the communication gap between the public and private platforms in order to develop solutions to a complex array of issues relating to online safety.While there are significant challenges relating to the scale of multiagent simulations, efficient learning from observational and interventional data to accurately model micro and macro-level emergent effects, there are equally promising opportunities not least with the advent of large language models that provide an expressive approximation of user behavior.In this position paper, we review prior art relating to social network simulation, highlighting challenges and opportunities for future work exploring multiagent security using agent-based models of social networks. |
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta 🔗 |
-
|
Oracles & Followers: Stackelberg Equilibria in Deep Multi-Agent Reinforcement Learning
(
Oral
)
link »
Stackelberg equilibria arise naturally in a range of popular learning problems, such as in security games or indirect mechanism design, and have received increasing attention in the reinforcement learning literature. We present a general framework for implementing Stackelberg equilibria search as a multi-agent RL problem, allowing a wide range of algorithmic design choices. We discuss how previous approaches can be seen as specific instantiations of this framework. As a key insight, we note that the design space allows for approaches not previously seen in the literature, for instance by leveraging multitask and meta-RL techniques for follower convergence. We propose one such approach using contextual policies, and evaluate it experimentally on both standard and novel benchmark domains, showing greatly improved sample efficiency compared to previous approaches. Finally, we explore the effect of adopting algorithm designs outside the borders of our framework. |
Matthias Gerstgrasser · David Parkes 🔗 |
-
|
Dynamics Model Based Adversarial Training For Competitive Reinforcement Learning
(
Spotlight
)
link »
Adversarial perturbations substantially degrade the performance of Deep Reinforcement Learning (DRL) agents, reducing the applicability of DRL in practice. Existing adversarial training for robustifying DRL uses the information of agent at the current step to minimize the loss upper bound introduced by adversarial input perturbations. It however only works well for single-agent tasks. The enhanced controversy in two-agent games introduces more dynamics and makes existing methods less effective. Inspired by model-based RL that builds a model for the environment transition probability, we propose a dynamics model based adversarial training framework for modeling multi-step state transitions. Our dynamics model transitively predicts future states, which can provide more precise back-propagated future information during adversarial perturbation generation, and hence improve the agent's empirical robustness substantially under different attacks. Our experiments on four two-agent competitive MuJoCo games show that our method consistently outperforms state-of-the-art adversarial training techniques in terms of empirical robustness and normal functionalities of DRL agents. |
Xuan Chen · Guanhong Tao · Xiangyu Zhang 🔗 |
-
|
Beyond Worst-case Attacks: Robust RL with Adaptive Defense via Non-dominated Policies
(
Spotlight
)
link »
Considerable focus has been directed towards ensuring that reinforcement learning (RL) policies are robust to adversarial attacks during test time. While current approaches are effective against strong attacks for potential worst-case scenarios, these methods often compromise performance in the absence of attacks or the presence of only weak attacks. To address this, we study policy robustness under the well-accepted state-adversarial attack model, extending our focus beyond merely worst-case attacks. We \textit{refine} the baseline policy class $\Pi$ prior to test time, aiming for efficient adaptation within a compact, finite policy class $\tilde{\Pi}$, which can resort to an adversarial bandit subroutine. We then propose a novel training-time algorithm to iteratively discover \textit{non-dominated policies}, forming a near-optimal and minimal $\tilde{\Pi}$. Empirical validation on the Mujoco corroborates the superiority of our approach in terms of natural and robust performance, as well as adaptability to various attack scenarios.
|
Xiangyu Liu · Chenghao Deng · Yanchao Sun · Yongyuan Liang · Furong Huang 🔗 |
-
|
Generation of Games for Opponent Model Differentiation
(
Spotlight
)
link »
Protecting against adversarial attacks is a common multiagent problem in the real world. Attackers in the real world are predominantly human actors, and the protection methods often incorporate opponent models to improve the performance when facing humans. Previous results show that modeling human behavior can significantly improve the performance of the algorithms. However, modeling humans correctly is a complex problem, and the models are often simplified and assume humans make mistakes according to some distribution or train parameters for the whole population from which they sample. In this work, we use data gathered by psychologists who identified personality types that increase the likelihood of performing malicious acts. However, in the previous work, the tests on a handmade game could not show strategic differences between the models. We created a novel model that links its parameters to psychological traits. We optimized over parametrized games and created games in which the differences are profound. Our work can help with automatic game generation when we need a game in which some models will behave differently and to identify situations in which the models do not align. |
David Milec · Viliam Lisy · Christopher Kiekintveld 🔗 |
-
|
Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag
(
Oral
)
link »
Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box. |
John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan 🔗 |
-
|
Second-order Jailbreaks: Generative Agents Successfully Manipulate Through an Intermediary
(
Spotlight
)
link »
As the capabilities of Large Language Models (LLMs) continue to expand, their application in communication tasks is becoming increasingly prevalent. However, this widespread use brings with it novel risks, including the susceptibility of LLMs to "jailbreaking" techniques. In this paper, we explore the potential for such risks in two- and three-agent communication networks, where one agent is tasked with protecting a password while another attempts to uncover it. Our findings reveal that an attacker, powered by advanced LLMs, can extract the password even through an intermediary that is instructed to prevent this. Our contributions include an experimental setup for evaluating the persuasiveness of LLMs, a demonstration of LLMs' ability to manipulate each other into revealing protected information, and a comprehensive analysis of this manipulative behavior. Our results underscore the need for further investigation into the safety and security of LLMs in communication networks. |
Mikhail Terekhov · Romain Graux · Eduardo Neville · Denis Rosset · Gabin Kolly 🔗 |
-
|
RAVE: Enabling safety verification for realistic deep reinforcement learning systems
(
Spotlight
)
link »
Recent advancements in reinforcement learning (RL) expedited its success across a wide range of decision-making problems. However, a lack of safety guarantees restricts its use in critical tasks. While recent work has proposed several verification techniques to provide such guarantees, they require that the state-transition function be known and the reinforcement learning policy be deterministic. Both of these properties may not be true in real environments, which significantly limits the use of existing verification techniques. In this work, we propose two approximation strategies that address the limitation of prior work allowing the safety verification of RL policies. We demonstrate that by augmenting state-of-the-art verification techniques with our proposed approximation strategies, we can guarantee the safety of non-deterministic RL policies operating in environments with unknown state-transition functions. We theoretically prove that our technique guarantees the safety of an RL policy at runtime. Our experiments on three representative RL tasks empirically verify the efficacy of our method in providing a safety guarantee to a target agent while maintaining its task execution performance. |
Wenbo Guo · Taesung Lee · Kevin Eykholt · Jiyong Jang 🔗 |
-
|
Cooperative AI via Decentralized Commitment Devices
(
Oral
)
link »
Credible commitment devices have been a popular approach for robust multi-agent coordination. However, existing commitment mechanisms face limitations like privacy, integrity, and susceptibility to mediator or user strategic behavior. It is unclear if the cooperative AI techniques we study are robust to real-world incentives and attack vectors. Fortunately, decentralized commitment devices that utilize cryptography have been deployed in the wild, and numerous studies have shown their ability to coordinate algorithmic agents, especially when agents face rational or sometimes adversarial opponents with significant economic incentives, currently in the order of several million to billions of dollars. In this paper, we illustrate potential security issues in cooperative AI via examples in the decentralization literature and, in particular, Maximal Extractable Value (MEV). We call for expanded research into decentralized commitments to advance cooperative AI capabilities for secure coordination in open environments and empirical testing frameworks to evaluate multi-agent coordination ability given real-world commitment constraints. |
Xyn Sun · Davide Crapis · Matt Stephenson · Jonathan Passerat-Palmbach 🔗 |
-
|
Robust Q-Learning against State Perturbations: a Belief-Enriched Pessimistic Approach
(
Spotlight
)
link »
Reinforcement learning (RL) has achieved phenomenal success in various domains. However, its data-driven nature also introduces new vulnerabilities that can be exploited by malicious opponents. Recent work shows that a well-trained RL agent can be easily manipulated by strategically perturbing its state observations at the test stage. Existing solutions either introduce a regularization term to improve the smoothness of the trained policy against perturbations or alternatively train the agent's policy and the attacker's policy. However, the former does not provide sufficient protection against strong attacks, while the latter is computationally prohibitive for large environments. In this work, we propose a new robust RL algorithm for deriving a pessimistic policy to safeguard against an agent's uncertainty about true states. This approach is further enhanced with belief state inference and diffusion-based state purification to reduce uncertainty. Empirical results show that our approach obtains superb performance under strong attacks and has a comparable training overhead with regularization-based methods. |
Xiaolin Sun · Zizhan Zheng 🔗 |
-
|
Assessing Risks of Using Autonomous Language Models in Military and Diplomatic Planning
(
Spotlight
)
link »
The potential integration of autonomous agents in high-stakes military and foreign-policy decision-making has gained prominence, especially with the emergence of advanced generative AI models like GPT-4. This paper aims to scrutinize the behavior of multiple autonomous agents in simulated military and diplomacy scenarios, specifically focusing on their potential to escalate conflicts. Drawing on established international relations frameworks, we assessed the escalation potential of decisions made by these agents in different scenarios. Contrary to prior qualitative studies, our research provides both qualitative and quantitative insights. We find that there are significant differences in the models' predilections to escalate, with Claude 2 being the least aggressive and GPT-4-Base the most aggressive models. Our findings indicate that, even in seemingly neutral contexts, language-model-based autonomous agents occasionally opt for aggressive or provocative actions. This tendency intensifies in scenarios with predefined trigger events. Importantly, the patterns behind such escalatory behavior remain largely unpredictable. Furthermore, a qualitative analysis of the models' verbalized reasoning, particularly in the GPT-4-Base model, reveals concerning justifications. Given the high stakes involved in military and foreign-policy contexts, the deployment of such autonomous agents demands further examination and cautious consideration. |
Gabe Mukobi · Ann-Katrin Reuel · Juan-Pablo Rivera · Chandler Smith 🔗 |
-
|
Stackelberg Games with Side Information
(
Spotlight
)
link »
We study an online learning setting in which a leader interacts with a sequence of followers over the course of $T$ rounds. At each round, the leader commits to a mixed strategy over actions, after which the follower best-responds. Such settings are referred to in the literature as Stackelberg games. Stackelberg games have received much interest from the community, in part due to their applicability to real-world security settings such as wildlife preservation and airport security. However despite this recent interest, current models of Stackelberg games fail to take into consideration the fact that the players' optimal strategies often depend on external factors such as weather patterns, airport traffic, etc. We address this gap by allowing for player payoffs to depend on an external context, in addition to the actions taken by each player. We formalize this setting as a repeated Stackelberg game with side information and show that under this setting, it is impossible to achieve sublinear regret if both the sequence of contexts and the sequence of followers is chosen adversarially. Motivated by this impossibility result, we consider two natural relaxations: (1) stochastically chosen contexts with adversarially chosen followers and (2) stochastically chosen followers with adversarially chosen contexts. In each of these settings, we provide algorithms which obtain $\tilde{\mathcal{O}}(\sqrt{T})$ regret.
|
Keegan Harris · Steven Wu · Maria-Florina Balcan 🔗 |
-
|
Safe Equilibrium
(
Spotlight
)
link »
The standard game-theoretic solution concept, Nash equilibrium, assumes that all players behave rationally. If we follow a Nash equilibrium and opponents are irrational (or follow strategies from a different Nash equilibrium), then we may obtain an extremely low payoff. On the other hand, a maximin strategy assumes that all opposing agents are playing to minimize our payoff (even if it is not in their best interest), and ensures the maximal possible worst-case payoff, but results in exceedingly conservative play. We propose a new solution concept called safe equilibrium that models opponents as behaving rationally with a specified probability and behaving potentially arbitrarily with the remaining probability. We prove that a safe equilibrium exists in all strategic-form games (for all possible values of the rationality parameters), and prove that its computation is PPAD-hard. |
Samuel Ganzfried 🔗 |
-
|
Leading the Pack: N-player Opponent Shaping
(
Oral
)
link »
Reinforcement learning solutions have great success in the 2-player general sum setting. In this setting, the paradigm of Opponent Shaping (OS), in which agents account for the learning of their co-players, has led to agents which are able to avoid collectively bad outcomes, whilst also maximizing their reward. These methods have currently been limited to 2-player game. However, the real world involves interactions with many more agents, with interactions on both local and global scales. In this paper, we extend Opponent Shaping (OS) methods to environments involving multiple co-players and multiple shaping agents. We evaluate on 4 different environments, varying the number of players from 3 to 5, and demonstrate that model-based OS methods converge to equilibrium with better global welfare than naive learning. However, we find that when playing with a large number of co-players, OS methods' relative performance reduces, suggesting that in the limit OS methods may not perform well. Finally, we explore scenarios where more than one OS method is present, noticing that within games requiring a majority of cooperating agents, OS methods converge to outcomes with poor global welfare. |
Alexandra Souly · Timon Willi · Akbir Khan · Robert Kirk · Chris Lu · Edward Grefenstette · Tim Rocktäschel 🔗 |
-
|
Harnessing the Power of Federated Learning in Federated Contextual Bandits
(
Spotlight
)
link »
Federated contextual bandits (FCB), as a pivotal instance of combining federated learning (FL) and sequential decision-making, have received growing interest in recent years. However, existing FCB designs often adopt FL protocols tailored for specific settings, deviating from the canonical FL framework. Such disconnections not only prohibit these designs from flexibly leveraging canonical FL algorithmic approaches but also set considerable barriers for FCB to incorporate growing studies on FL attributes such as robustness and privacy. To promote a closer relationship between FL and FCB, we propose a novel FCB design, FedIGW, which can flexibly incorporate both existing and future FL protocols and thus is capable of harnessing the full spectrum of FL advances. |
Chengshuai Shi · Kun Yang · Ruida Zhou · Cong Shen 🔗 |
-
|
Decentralized agent-based modeling
(
Spotlight
)
link »
The utility of agent-based models for practical decision making depends upon their ability to recreate populations with great detail and integrate real-world data streams. However, incorporating this data can be challenging due to privacy concerns. We alleviate this issue by introducing a paradigm for secure agent-based modeling. In particular, we leverage secure multi-party computation to enable decentralized agent-based simulation, calibration, and analysis. We believe this is a critical step towards making agent-based models scalable to the real-world application. |
Ayush Chopra · Arnau Quera-Bofarull · Nurullah Giray Kuru · Ramesh Raskar 🔗 |
-
|
I See You! Robust Measurement of Adversarial Behavior
(
Oral
)
link »
We introduce the study of non-manipulable measures of manipulative behavior in multi-agent systems. We do this through a case study of decentralized finance (DeFi) and blockchain systems, which are salient as real-world, rapidly emerging multi-agent systems with financial incentives for malicious behavior, for the participation in algorithmic and AI systems, and for the need for new methods with which to measure levels of manipulative behavior. We introduce a new surveillance metric for measuring malicious behavior and demonstrate its effectiveness in a natural experiment to the Uniswap DeFi ecosystem. |
Lars Ankile · Matheus Xavier Ferreira · David Parkes 🔗 |
-
|
TBA
(
Keynote
)
TBA |
Lewis Hammond 🔗 |
-
|
TBA
(
[On-Demand] Keynote
)
|
Laura Edelson 🔗 |
-
|
TBA
(
[On-Demand] Keynote
)
|
Ana-Maria Cretu 🔗 |
-
|
TBA
(
[On-Demand] Keynote
)
|
Smitha Millie 🔗 |
-
|
TBA
(
[On-Demand] Keynote
)
|
Stratis Skoulakis 🔗 |
Author Information
Christian Schroeder de Witt (University of Oxford)
I am a 4th-year PhD student conducting fundamental algorithmic research in deep multi-agent reinforcement learning and climate change. My supervision is jointly between Prof. Shimon Whiteson (WhiRL - see my [profile](http://whirl.cs.ox.ac.uk/member/christian-schroeder-de-witt/)) and Prof. Philip Torr (Torr Vision Group).
Hawra Milani (Royal Holloway University of London)
Klaudia Krawiecka (Department of Computer Science)
Swapneel Mehta (Boston Univ. and MIT)
I am a postdoc at BU and MIT researching platform governance. My Ph.D. research dealt with limiting misinformation on social networks using tools from ML and Causal Inference. I run a research collective funded by Google and Mozilla, to train students from the global south to build tools that improve trust on the social web.
Carla Cremer (University of Oxford)
Martin Strohmeier (armasuisse Science & Technology)
More from the Same Authors
-
2023 : Multiagent Simulators for Social Networks »
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta -
2023 : Multiagent Simulators for Social Networks »
Aditya Surve · Archit Rathod · Mokshit Surana · Gautam Malpani · Aneesh Shamraj · SAINATH SANKEPALLY · Raghav Jain · Swapneel Mehta -
2023 : JaxMARL: Multi-Agent RL Environments in JAX »
Alexander Rutherford · Benjamin Ellis · Matteo Gallici · Jonathan Cook · Andrei Lupu · Garðar Ingvarsson · Timon Willi · Akbir Khan · Christian Schroeder de Witt · Alexandra Souly · Saptarashmi Bandyopadhyay · Mikayel Samvelyan · Minqi Jiang · Robert Lange · Shimon Whiteson · Bruno Lacerda · Nick Hawes · Tim Rocktäschel · Chris Lu · Jakob Foerster -
2022 : Expanding Access to ML Research through Student-led Collaboratives »
Deep Gandhi · Raghav Jain · Jay Gala · Jhagrut Lalwani · Swapneel Mehta -
2022 Workshop: Broadening Research Collaborations »
Sara Hooker · Rosanne Liu · Pablo Samuel Castro · FatemehSadat Mireshghallah · Sunipa Dev · Benjamin Rosman · João Madeira Araújo · Savannah Thais · Sara Hooker · Sunny Sanyal · Tejumade Afonja · Swapneel Mehta · Tyler Zhu -
2022 Poster: Discovered Policy Optimisation »
Chris Lu · Jakub Kuba · Alistair Letcher · Luke Metz · Christian Schroeder de Witt · Jakob Foerster -
2022 Poster: Equivariant Networks for Zero-Shot Coordination »
Darius Muglich · Christian Schroeder de Witt · Elise van der Pol · Shimon Whiteson · Jakob Foerster