Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Multi-Agent Security: Security as Key to AI Safety

Language Agents as Hackers: Evaluating Cybersecurity Skills with Capture the Flag

John Yang · Akshara Prabhakar · Shunyu Yao · Kexin Pei · Karthik Narasimhan

Keywords: [ Language Agents ] [ security ] [ Natural Language Processing ] [ Software Engineering ]

[ ] [ Project Page ]
 
presentation: Multi-Agent Security: Security as Key to AI Safety
Sat 16 Dec 7 a.m. PST — 3:30 p.m. PST

Abstract:

Amidst the advent of language models (LMs) and their wide-ranging capabilities, concerns have been raised about their implications with regards to privacy and security. In particular, the emergence of language agents as a promising aid for automating and augmenting digital work poses immediate questions concerning their misuse as malicious cybersecurity actors. With their exceptional compute efficiency and execution speed relative to human counterparts, language agents may be extremely adept at locating vulnerabilities, performing complex social engineering, and hacking real world systems. Understanding and guiding the development of language agents in the cybersecurity space requires a grounded understanding of their capabilities founded on empirical data and demonstrations. To address this need, we introduce InterCode-CTF, a novel task environment and benchmark for evaluating language agents on the Capture the Flag (CTF) task. Built as a facsimile of real world CTF competitions, in the InterCode-CTF environment, a language agent is tasked with finding a flag from a purposely-vulnerable computer program. We manually collect and verify a benchmark of 100 task instances that require a number of cybersecurity skills such as reverse engineering, forensics, and binary exploitation, then evaluate current top-notch LMs on this evaluation set. Our preliminary findings indicate that while language agents possess rudimentary cybersecurity knowledge, they are not able to perform multi-step cybersecurity tasks out-of-the-box.

Chat is not available.