firstbacksecondback
23 Results
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |
||
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |
||
Workshop
|
Does Refusal Training in LLMs Generalize to the Past Tense? Maksym Andriushchenko · Nicolas Flammarion |
||
Poster
|
Thu 16:30 |
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks Andy Zhou · Bo Li · Haohan Wang |
|
Workshop
|
AutoDefense: Multi-Agent LLM Defense against Jailbreak Attacks Yifan Zeng · Yiran Wu · Xiao Zhang · Huazheng Wang · Qingyun Wu |
||
Workshop
|
Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
||
Workshop
|
Jailbreaking Large Language Models with Symbolic Mathematics Emet Bethany · Mazal Bethany · Juan Nolazco-Flores · Sumit Jha · peyman najafirad |
||
Workshop
|
Sun 16:50 |
Contributed Talk 6: Infecting LLM Agents via Generalizable Adversarial Attack Weichen Yu · Kai Hu · Tianyu Pang · Chao Du · Min Lin · Matt Fredrikson |
|
Poster
|
Thu 11:00 |
Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs Zhao Xu · Fan LIU · Hao Liu |
|
Workshop
|
LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |
||
Poster
|
Thu 11:00 |
Tree of Attacks: Jailbreaking Black-Box LLMs Automatically Anay Mehrotra · Manolis Zampetakis · Paul Kassianik · Blaine Nelson · Hyrum Anderson · Yaron Singer · Amin Karbasi |
|
Workshop
|
Sun 11:05 |
Contributed Talk 3: LLM Defenses Are Not Robust to Multi-Turn Human Jailbreaks Yet Nathaniel Li · Ziwen Han · Ian Steneker · Willow Primack · Riley Goodside · Hugh Zhang · Zifan Wang · Cristina Menghini · Summer Yue |