Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

65 Results

<<   <   Page 5 of 6   >   >>
Affinity Event
The Root Shapes the Fruit: On the Persistence of Gender-Exclusive Harms in Aligned Language Models
Anaelia Ovalle · Krunoslav Lehman Pavasovic · Louis Martin · Luke Zettlemoyer · Eric Michael Smith · Kai-Wei Chang · Adina Williams · Levent Sagun
Poster
Thu 11:00 Alignment at Pre-training! Towards Native Alignment for Arabic LLMs
Juhao Liang · Zhenyang Cai · Jianqing Zhu · Huang Huang · Kewei Zong · Bang An · Mosen Alharthi · Juncai He · Lian Zhang · Haizhou Li · Benyou Wang · Jinchao Xu
Poster
Fri 16:30 Improving Alignment and Robustness with Circuit Breakers
Andy Zou · Long Phan · Justin Wang · Derek Duenas · Maxwell Lin · Maksym Andriushchenko · J. Zico Kolter · Matt Fredrikson · Dan Hendrycks
Poster
Wed 11:00 Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk · Youssef Mroueh · Brian Belgodere · Mattia Rigotti · Apoorva Nitsure · Mikhail Yurochkin · Kristjan Greenewald · Jiri Navratil · Jarret Ross
Workshop
AEGIS2.0: A Diverse AI Safety Dataset and Risks Taxonomy for Alignment of LLM Guardrails
Shaona Ghosh · Prasoon Varshney · Makesh Narsimhan Sreedhar · Aishwarya Padmakumar · Traian Rebedea · Jibin Varghese · Christopher Parisien
Poster
Wed 11:00 Improved Few-Shot Jailbreaking Can Circumvent Aligned Language Models and Their Defenses
Xiaosen Zheng · Tianyu Pang · Chao Du · Qian Liu · Jing Jiang · Min Lin
Workshop
MID-Space: Aligning Diverse Communities' Needs to Inclusive Public Spaces
Shravan Nayak · Rashid Mushkani · Hugo Berard · Allison Cohen · Shin Koseki · Hadrien Bertrand
Workshop
Personalized Soups: Personalized Large Language Model Alignment via Post-hoc Parameter Merging
Joel Jang · Seungone Kim · Bill Yuchen Lin · Yizhong Wang · Jack Hessel · Luke Zettlemoyer · Hannaneh Hajishirzi · Yejin Choi · Prithviraj Ammanabrolu
Workshop
Visual Language Alignment Tuning
LE ZHANG · Qian Yang · Aishwarya Agrawal
Workshop
Rule-Guided Language Model Alignment for Text Generation Management in Industrial Use Cases
Shunichi Akatsuka · Aman Kumar · Xian Yeow Lee · Lasitha Vidyaratne · Dipanjan Ghosh · Ahmed Farahat
Workshop
Auto-Enhance: Towards a Meta-Benchmark to Evaluate AI Agents' Ability to Improve Other Agents
Samuel Brown · Basil Labib · Codruta Lugoj · Sai Sasank Y
Workshop
Moral Persuasion in Large Language Models: Evaluating Susceptibility and Ethical Alignment
Allison Huang · Carlos Mougan · Yulu Pi