firstbacksecondback
179 Results
Workshop
|
Between the Bars: Gradient-based Jailbreaks are Bugs that induce Features Kaivalya Hariharan · Uzay Girit |
||
Workshop
|
Hiding-in-Plain-Sight (HiPS) Attack on CLIP for Targetted Object Removal from Images Megan Chung · Arka Daw · Maria Mahbub · Amir Sadovnik |
||
Workshop
|
Cold Posterior Effect towards Adversarial Robustness Bruce Rushing · Antonios Alexos · Harrison Espino · Nicholas Cohen · Pierre Baldi |
||
Workshop
|
Sat 15:45 |
Adversarial Robust Deep Reinforcement Learning is Neither Robust Nor Safe Ezgi Korkmaz |
|
Workshop
|
Shh, don't say that! Domain Certification in LLMs Cornelius Emde · Preetham Arvind · Alasdair Paren · Maxime Kayser · Thomas Rainforth · Thomas Lukasiewicz · Philip Torr · Adel Bibi |
||
Workshop
|
Smoothing-Based Adversarial Defense Methods for Inverse Problems Yang Sun · Jonathan Scarlett |
||
Workshop
|
An Adversarial Perspective on Machine Unlearning for AI Safety Jakub Łucki · Boyi Wei · Yangsibo Huang · Peter Henderson · Florian Tramer · Javier Rando |
||
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |
||
Workshop
|
Rethinking Adversarial Attacks as Protection Against Diffusion-based Mimicry Haotian Xue · Yongxin Chen |
||
Poster
|
Wed 11:00 |
Erasing Undesirable Concepts in Diffusion Models with Adversarial Preservation Anh Bui · Tung-Long Vuong · Khanh Doan · Trung Le · Paul Montague · Tamas Abraham · Dinh Phung |
|
Workshop
|
Robust Feature Learning for Multi-Index Models in High Dimensions Alireza Mousavi-Hosseini · Adel Javanmard · Murat Erdogdu |