firstbacksecondback
13 Results
Poster
|
Thu 11:00 |
Intruding with Words: Towards Understanding Graph Injection Attacks at the Text Level Runlin Lei · Yuwei Hu · Yuchen Ren · Zhewei Wei |
|
Workshop
|
Sun 14:40 |
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs |
|
Workshop
|
LLM-PIRATE: A benchmark for indirect prompt injection attacks in Large Language Models Anil Ramakrishna · Jimit Majmudar · Rahul Gupta · Devamanyu Hazarika |
||
Poster
|
Wed 16:30 |
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor Shaokui Wei · Hongyuan Zha · Baoyuan Wu |
|
Poster
|
Thu 16:30 |
AgentDojo: A Dynamic Environment to Evaluate Prompt Injection Attacks and Defenses for LLM Agents Edoardo Debenedetti · Jie Zhang · Mislav Balunovic · Luca Beurer-Kellner · Marc Fischer · Florian Tramer |
|
Poster
|
Wed 16:30 |
Are Your Models Still Fair? Fairness Attacks on Graph Neural Networks via Node Injections Zihan Luo · Hong Huang · Yongkang Zhou · Jiping Zhang · Nuo Chen · Hai Jin |
|
Poster
|
Thu 16:30 |
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou · Bo Li · Haohan Wang |
|
Workshop
|
Attack Atlas: A Practitioner's Perspective on Challenges and Pitfalls in Red Teaming GenAI Ambrish Rawat · Stefan Schoepf · Giulio Zizzo · Giandomenico Cornacchia · Muhammad Zaid Hameed · Kieran Fraser · Erik Miehling · Beat Buesser · Elizabeth Daly · Mark Purcell · Prasanna Sattigeri · Pin-Yu Chen · Kush Varshney |
||
Poster
|
Fri 11:00 |
Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration Wenjie Fu · Huandong Wang · Chen Gao · Guanghua Liu · Yong Li · Tao Jiang |
|
Poster
|
Thu 16:30 |
Soft Prompt Threats: Attacking Safety Alignment and Unlearning in Open-Source LLMs through the Embedding Space Leo Schwinn · David Dobre · Sophie Xhonneux · Gauthier Gidel · Stephan Günnemann |
|
Workshop
|
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks Nathalie Kirch · Severin Field · Stephen Casper |
||
Workshop
|
Adversarial Prompt Evaluation: Systematic Benchmarking of Guardrails Against Prompt Input Attacks on LLMs Giulio Zizzo · Giandomenico Cornacchia · Kieran Fraser · Muhammad Zaid Hameed · Ambrish Rawat · Beat Buesser · Mark Purcell · Pin-Yu Chen · Prasanna Sattigeri · Kush Varshney |