Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

65 Results

<<   <   Page 6 of 6   >>   >
Workshop
HarmAnalyst: Interpretable, transparent, and steerable LLM safety moderation
Jing-Jing Li · Valentina Pyatkin · Max Kleiman-Weiner · Liwei Jiang · Nouha Dziri · Anne Collins · Jana Schaich Borg · Maarten Sap · Yejin Choi · Sydney Levine
Workshop
Declarative characterizations of direct preference alignment algorithms
Kyle Richardson · Vivek Srikumar · Ashish Sabharwal
Poster
Wed 11:00 The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale
Oral
Wed 10:40 The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale
Workshop
Fine-Tuning Language Models for Ethical Ambiguity: A Comparative Study of Alignment with Human Responses
Pranav Senthilkumar · Visshwa Balasubramanian · Aneesa Maity · Prisha Jain · Kevin Zhu · Jonathan Lu