firstbacksecondback
84 Results
Workshop
|
Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
||
Workshop
|
Sun 10:55 |
Contributed Talk 2: Failures to Find Transferable Image Jailbreaks Between Vision-Language Models Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Zane Durante · Cristobal Eyzaguirre · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |
|
Poster
|
Wed 11:00 |
Dual Risk Minimization: Towards Next-Level Robustness in Fine-tuning Zero-Shot Models Kaican Li · Weiyan XIE · Yongxiang Huang · Didan Deng · Lanqing Hong · Zhenguo Li · Ricardo Silva · Nevin L. Zhang |
|
Poster
|
Fri 11:00 |
Group Robust Preference Optimization in Reward-free RLHF Shyam Sundhar Ramesh · Yifan Hu · Iason Chaimalas · Viraj Mehta · Pier Giuseppe Sessa · Haitham Bou Ammar · Ilija Bogunovic |
|
Poster
|
Fri 11:00 |
JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models Patrick Chao · Edoardo Debenedetti · Alexander Robey · Maksym Andriushchenko · Francesco Croce · Vikash Sehwag · Edgar Dobriban · Nicolas Flammarion · George J. Pappas · Florian Tramer · Hamed Hassani · Eric Wong |
|
Workshop
|
Shh, don't say that! Domain Certification in LLMs Cornelius Emde · Preetham Arvind · Alasdair Paren · Maxime Kayser · Thomas Rainforth · Thomas Lukasiewicz · Philip Torr · Adel Bibi |
||
Workshop
|
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini · Adel Javanmard · Murat Erdogdu |
||
Workshop
|
Forget to Flourish: Leveraging Machine-Unlearning on Pretrained Language Models for Privacy Leakage Rafi Rashid · Jing Liu · Toshiaki Koike-Akino · Shagufta Mehnaz · Ye Wang |
||
Workshop
|
Coordinated Robustness Evaluation Framework for Vision Language Models Ashwin Ramesh Babu · Sajad Mousavi · Desik Rengarajan · Vineet Gundecha · Sahand Ghorbanpour · Avisek Naug · Antonio Guillen-Perez · Ricardo Luna Gutierrez · Soumyendu Sarkar |
||
Workshop
|
Leveraging Periodicity for Robustness with Multi-modal Mood Pattern Models Jaya Narain · Qinhua Sun · Oussama Elachqar · Haraldur Hallgrimsson · Feng Zhu · Shirley Ren |
||
Workshop
|
Learning Diverse Attacks on Large Language Models for Robust Red-Teaming and Safety Tuning Seanie Lee · Minsu Kim · Lynn Cherif · David Dobre · Juho Lee · Sung Ju Hwang · Kenji Kawaguchi · Gauthier Gidel · Yoshua Bengio · Nikolay Malkin · Moksh Jain |
||
Workshop
|
Sat 10:45 |
When Do Universal Image Jailbreaks Transfer Between Vision-Language Models? Rylan Schaeffer · Dan Valentine · Luke Bailey · James Chua · Cristobal Eyzaguirre · Zane Durante · Joe Benton · Brando Miranda · Henry Sleight · Tony Wang · John Hughes · Rajashree Agrawal · Mrinank Sharma · Scott Emmons · Sanmi Koyejo · Ethan Perez |