firstbacksecondback
8 Results
Workshop
|
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models Hongfu Liu · Yuxi Xie · Ye Wang · Michael Qizhe Shieh |
||
Workshop
|
What Features in Prompts Jailbreak LLMs? Investigating the Mechanisms Behind Attacks Nathalie Kirch · Severin Field · Stephen Casper |
||
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
Sun 10:55 |
Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
|
Workshop
|
Sat 10:45 |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |