Skip to yearly menu bar Skip to main content


Search All 2024 Events
 

18 Results

<<   <   Page 1 of 2   >   >>
Workshop
Sat 15:00 Phillip Isola (MIT): Representation Learning from Human Feedback
Phillip Isola
Workshop
Aligning to What? Limits to RLHF Based Alignment
Logan Barnhart · Reza Akbarian Bafghi · Maziar Raissi · Stephen Becker
Workshop
Sat 16:00 Contributed Talk: Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Workshop
Personalized Language Modeling from Personalized Human Feedback
Xinyu Li · Ruiyang Zhou · Zachary Lipton · Liu Leqi
Workshop
Sat 15:45 Taming False Positives in Out-of-Distribution Detection with Human Feedback
Harit Vishwakarma · Heguang Lin · Ramya Korlakai Vinayak
Poster
Fri 11:00 Online Iterative Reinforcement Learning from Human Feedback with General Preference Model
Chenlu Ye · Wei Xiong · Yuheng Zhang · Hanze Dong · Nan Jiang · Tong Zhang
Oral
Wed 10:40 The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale
Poster
Wed 11:00 The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models
Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale
Poster
Wed 11:00 When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback
Leon Lang · Davis Foote · Stuart J Russell · Anca Dragan · Erik Jenner · Scott Emmons
Workshop
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback
Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan
Poster
Thu 16:30 Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Sriyash Poddar · Yanming Wan · Hamish Ivison · Abhishek Gupta · Natasha Jaques
Workshop
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning
Sriyash Poddar · Yanming Wan · Hamish Ivison · Abhishek Gupta · Natasha Jaques