Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

126 Results

<<   <   Page 2 of 11   >   >>
Poster
Wed 16:30 MLLMGuard: A Multi-dimensional Safety Evaluation Suite for Multimodal Large Language Models
Tianle Gu · Zeyang Zhou · Kexin Huang · Liang Dandan · Yixu Wang · Haiquan Zhao · Yuanqi Yao · xingge qiao · Keqing wang · Yujiu Yang · Yan Teng · Yu Qiao · Yingchun Wang
Poster
Thu 16:30 WildGuard: Open One-stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han · Kavel Rao · Allyson Ettinger · Liwei Jiang · Bill Yuchen Lin · Nathan Lambert · Yejin Choi · Nouha Dziri
Poster
Thu 11:00 MedSafetyBench: Evaluating and Improving the Medical Safety of Large Language Models
Tessa Han · Aounon Kumar · Chirag Agarwal · Himabindu Lakkaraju
Oral Session
Fri 15:30 Oral Session 6A: Machine Learning and Science, Safety
Poster
Wed 16:30 What Makes and Breaks Safety Fine-tuning? A Mechanistic Study
Samyak Jain · Ekdeep S Lubana · Kemal Oksuz · Tom Joy · Philip Torr · Amartya Sanyal · Puneet Dokania
Poster
Fri 16:30 The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Ezra Edelman · Nikolaos Tsilivis · Benjamin Edelman · Eran Malach · Surbhi Goel
Poster
Thu 11:00 BackdoorAlign: Mitigating Fine-tuning based Jailbreak Attack with Backdoor Enhanced Safety Alignment
Jiongxiao Wang · Jiazhao LI · Yiquan Li · Xiangyu Qi · Junjie Hu · Sharon Li · Patrick McDaniel · Muhao Chen · Bo Li · Chaowei Xiao
Poster
Wed 11:00 SafeWorld: Geo-Diverse Safety Alignment
Da Yin · Haoyi Qiu · Kung-Hsiang Huang · Kai-Wei Chang · Nanyun Peng
Poster
Fri 16:30 Improving Alignment and Robustness with Circuit Breakers
Andy Zou · Long Phan · Justin Wang · Derek Duenas · Maxwell Lin · Maksym Andriushchenko · J. Zico Kolter · Matt Fredrikson · Dan Hendrycks
Poster
Wed 11:00 Navigating the Safety Landscape: Measuring Risks in Finetuning Large Language Models
ShengYun Peng · Pin-Yu Chen · Matthew Hull · Duen Horng Chau
Poster
Fri 11:00 Safe LoRA: The Silver Lining of Reducing Safety Risks when Finetuning Large Language Models
Chia-Yi Hsu · Yu-Lin Tsai · Chih-Hsun Lin · Pin-Yu Chen · Chia-Mu Yu · Chun-Ying Huang
Affinity Event
Beyond the Safety Bundle: Auditing the Helpful and Harmless Dataset
Khaoula Chehbouni · Jonathan Colaço Carr · Yash More · Jackie CK Cheung · Golnoosh Farnadi