firstbacksecondback
65 Results
Affinity Event
|
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models Anaelia Ovalle · Krunoslav Lehman Pavasovic · Louis Martin · Luke Zettlemoyer · Eric Michael Smith · Kai-Wei Chang · Adina Williams · Levent Sagun |
||
Poster
|
Thu 11:00 |
Alignment at Pre-training! Towards Native Alignment for Arabic LLMs Juhao Liang · Zhenyang Cai · Jianqing Zhu · Huang Huang · Kewei Zong · Bang An · Mosen Alharthi · Juncai He · Lian Zhang · Haizhou Li · Benyou Wang · Jinchao Xu |
|
Poster
|
Fri 16:30 |
Improving Alignment and Robustness with Circuit Breakers Andy Zou · Long Phan · Justin Wang · Derek Duenas · Maxwell Lin · Maksym Andriushchenko · J. Zico Kolter · Matt Fredrikson · Dan Hendrycks |
|
Poster
|
Wed 11:00 |
Distributional Preference Alignment of LLMs via Optimal Transport Igor Melnyk · Youssef Mroueh · Brian Belgodere · Mattia Rigotti · Apoorva Nitsure · Mikhail Yurochkin · Kristjan Greenewald · Jiri Navratil · Jarret Ross |
|
Workshop
|
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien |
||
Poster
|
Wed 11:00 |
Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Jing Jiang · Min Lin |
|
Workshop
|
MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces Shravan Nayak · Rashid Mushkani · Hugo Berard · Allison Cohen · Shin Koseki · Hadrien Bertrand |
||
Workshop
|
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging Joel Jang · Seungone Kim · Bill Yuchen Lin · Yizhong Wang · Jack Hessel · Luke Zettlemoyer · Hannaneh Hajishirzi · Yejin Choi · Prithviraj Ammanabrolu |
||
Workshop
|
Visual Language Alignment Tuning LE ZHANG · Qian Yang · Aishwarya Agrawal |
||
Workshop
|
Rule-Guided Language Model Alignment for Text Generation Management in Industrial Use Cases Shunichi Akatsuka · Aman Kumar · Xian Yeow Lee · Lasitha Vidyaratne · Dipanjan Ghosh · Ahmed Farahat |
||
Workshop
|
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y |
||
Workshop
|
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment Allison Huang · Carlos Mougan · Yulu Pi |