Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

24 Results

<<   <   Page 1 of 2   >   >>
Poster
Wed 11:00 Improved Generation of Adversarial Examples Against Safety-aligned LLMs
Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen
Workshop
Sat 13:15 Keynote 3: Risk assessment, safety alignment, and guardrails for multimodal foundation models
Bo Li
Workshop
Plentiful Jailbreaks with String Compositions
Brian Huang
Workshop
Plentiful Jailbreaks with String Compositions
Brian Huang
Workshop
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models
Hongfu Liu · Yuxi Xie · Ye Wang · Michael Qizhe Shieh
Workshop
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs
Megh Thakkar · Yash More · Quentin Fournier · Matthew Riemer · Pin-Yu Chen · Amal Zouaq · Payel Das · Sarath Chandar
Workshop
MISR: Measuring Instrumental Self-Reasoning in Frontier Models
Kai Fronsdal · David Lindner
Poster
Fri 16:30 SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset
Juntao Dai · Tianle Chen · Xuyao Wang · Ziran Yang · Taiye Chen · Jiaming Ji · Yaodong Yang
Workshop
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien
Workshop
Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training
Jingyu Zhang · Ahmed Elgohary Ghoneim · Ahmed Magooda · Daniel Khashabi · Ben Van Durme
Workshop
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart · Reza Akbarian Bafghi · Maziar Raissi · Stephen Becker
Workshop
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study
Tanay Baswa · Nitin Aravind Birur · Divyanshu Kumar · Jatan Loya · Anurakt Kumar · Prashanth Harshangi · Sahil Agarwal