firstbacksecondback
126 Results
Poster
|
Wed 16:30 |
MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models Tianle Gu · Zeyang Zhou · Kexin Huang · Liang Dandan · Yixu Wang · Haiquan Zhao · Yuanqi Yao · xingge qiao · Keqing wang · Yujiu Yang · Yan Teng · Yu Qiao · Yingchun Wang |
|
Poster
|
Thu 16:30 |
WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs Seungju Han · Kavel Rao · Allyson Ettinger · Liwei Jiang · Bill Yuchen Lin · Nathan Lambert · Yejin Choi · Nouha Dziri |
|
Poster
|
Thu 11:00 |
MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models Tessa Han · Aounon Kumar · Chirag Agarwal · Himabindu Lakkaraju |
|
Oral Session
|
Fri 15:30 |
Oral Session 6A: Machine Learning and Science, Safety |
|
Poster
|
Wed 16:30 |
What Makes and Breaks Safety Fine-tuning? A Mechanistic Study Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania |
|
Poster
|
Fri 16:30 |
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains Ezra Edelman · Nikolaos Tsilivis · Benjamin Edelman · Eran Malach · Surbhi Goel |
|
Poster
|
Thu 11:00 |
BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment Jiongxiao Wang · Jiazhao LI · Yiquan Li · Xiangyu Qi · Junjie Hu · Sharon Li · Patrick McDaniel · Muhao Chen · Bo Li · Chaowei Xiao |
|
Poster
|
Wed 11:00 |
SafeWorld: Geo-Diverse Safety Alignment Da Yin · Haoyi Qiu · Kung-Hsiang Huang · Kai-Wei Chang · Nanyun Peng |
|
Poster
|
Fri 16:30 |
Improving Alignment and Robustness with Circuit Breakers Andy Zou · Long Phan · Justin Wang · Derek Duenas · Maxwell Lin · Maksym Andriushchenko · J. Zico Kolter · Matt Fredrikson · Dan Hendrycks |
|
Poster
|
Wed 11:00 |
Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models ShengYun Peng · Pin-Yu Chen · Matthew Hull · Duen Horng Chau |
|
Poster
|
Fri 11:00 |
Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models Chia-Yi Hsu · Yu-Lin Tsai · Chih-Hsun Lin · Pin-Yu Chen · Chia-Mu Yu · Chun-Ying Huang |
|
Affinity Event
|
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset Khaoula Chehbouni · Jonathan Colaço Carr · Yash More · Jackie CK Cheung · Golnoosh Farnadi |