Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

13 Results

<<   <   Page 1 of 2   >   >>
Poster
Thu 11:00 Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level
Runlin Lei · Yuwei Hu · Yuchen Ren · Zhewei Wei
Workshop
Sun 14:40 Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Workshop
LLM-PIRATE: A benchmark for indirect prompt injection attacks in Large Language Models
Anil Ramakrishna · Jimit Majmudar · Rahul Gupta · Devamanyu Hazarika
Poster
Wed 16:30 Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor
Shaokui Wei · Hongyuan Zha · Baoyuan Wu
Poster
Thu 16:30 AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents
Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer
Poster
Wed 16:30 Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections
Zihan Luo · Hong Huang · Yongkang Zhou · Jiping Zhang · Nuo Chen · Hai Jin
Poster
Thu 16:30 Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou · Bo Li · Haohan Wang
Workshop
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI
Ambrish Rawat · Stefan Schoepf · Giulio Zizzo · Giandomenico Cornacchia · Muhammad Zaid Hameed · Kieran Fraser · Erik Miehling · Beat Buesser · Elizabeth Daly · Mark Purcell · Prasanna Sattigeri · Pin-Yu Chen · Kush Varshney
Poster
Fri 11:00 Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration
Wenjie Fu · Huandong Wang · Chen Gao · Guanghua Liu · Yong Li · Tao Jiang
Poster
Thu 16:30 Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space
Leo Schwinn · David Dobre · Sophie Xhonneux · Gauthier Gidel · Stephan Günnemann
Workshop
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks
Nathalie Kirch · Severin Field · Stephen Casper
Workshop
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs
Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney