firstbacksecondback
24 Results
Poster
|
Wed 11:00 |
Improved Generation of Adversarial Examples Against Safety-aligned LLMs Qizhang Li · Yiwen Guo · Wangmeng Zuo · Hao Chen |
|
Workshop
|
Sat 13:15 |
Keynote 3: Risk assessment, safety alignment, and guardrails for multimodal foundation models Bo Li |
|
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |
||
Workshop
|
Plentiful Jailbreaks with String Compositions Brian Huang |
||
Workshop
|
Advancing Adversarial Suffix Transfer Learning on Aligned Large Language Models Hongfu Liu · Yuxi Xie · Ye Wang · Michael Qizhe Shieh |
||
Workshop
|
Combining Domain and Alignment Vectors to Achieve Better Knowledge-Safety Trade-offs in LLMs Megh Thakkar · Yash More · Quentin Fournier · Matthew Riemer · Pin-Yu Chen · Amal Zouaq · Payel Das · Sarath Chandar |
||
Workshop
|
MISR: Measuring Instrumental Self-Reasoning in Frontier Models Kai Fronsdal · David Lindner |
||
Poster
|
Fri 16:30 |
SafeSora: Towards Safety Alignment of Text2Video Generation via a Human Preference Dataset Juntao Dai · Tianle Chen · Xuyao Wang · Ziran Yang · Taiye Chen · Jiaming Ji · Yaodong Yang |
|
Workshop
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien |
||
Workshop
|
Controllable Safety Alignment: Adapting LLMs to Diverse Safety Requirements without Re-Training Jingyu Zhang · Ahmed Elgohary Ghoneim · Ahmed Magooda · Daniel Khashabi · Ben Van Durme |
||
Workshop
|
Aligning to What? Limits to RLHF Based Alignment Logan Barnhart · Reza Akbarian Bafghi · Maziar Raissi · Stephen Becker |
||
Workshop
|
Efficacy of the SAGE-RT Dataset for Model Safety Alignment: A Comparative Study Tanay Baswa · Nitin Aravind Birur · Divyanshu Kumar · Jatan Loya · Anurakt Kumar · Prashanth Harshangi · Sahil Agarwal |