firstbacksecondback
18 Results
Workshop
|
Sat 15:00 |
Phillip Isola (MIT): Representation Learning from Human Feedback Phillip Isola |
|
Workshop
|
Aligning to What? Limits to RLHF Based Alignment Logan Barnhart · Reza Akbarian Bafghi · Maziar Raissi · Stephen Becker |
||
Workshop
|
Sat 16:00 |
Contributed Talk: Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning |
|
Workshop
|
Personalized Language Modeling from Personalized Human Feedback Xinyu Li · Ruiyang Zhou · Zachary Lipton · Liu Leqi |
||
Workshop
|
Sat 15:45 |
Taming False Positives in Out-of-Distribution Detection with Human Feedback Harit Vishwakarma · Heguang Lin · Ramya Korlakai Vinayak |
|
Poster
|
Fri 11:00 |
Online Iterative Reinforcement Learning from Human Feedback with General Preference Model Chenlu Ye · Wei Xiong · Yuheng Zhang · Hanze Dong · Nan Jiang · Tong Zhang |
|
Oral
|
Wed 10:40 |
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale |
|
Poster
|
Wed 11:00 |
The PRISM Alignment Dataset: What Participatory, Representative and Individualised Human Feedback Reveals About the Subjective and Multicultural Alignment of Large Language Models Hannah Rose Kirk · Alexander Whitefield · Paul Rottger · Andrew M. Bean · Katerina Margatina · Rafael Mosquera-Gomez · Juan Ciro · Max Bartolo · Adina Williams · He He · Bertie Vidgen · Scott Hale |
|
Poster
|
Wed 11:00 |
When Your AIs Deceive You: Challenges of Partial Observability in Reinforcement Learning from Human Feedback Leon Lang · Davis Foote · Stuart J Russell · Anca Dragan · Erik Jenner · Scott Emmons |
|
Workshop
|
Targeted Manipulation and Deception Emerge in LLMs Trained on User* Feedback Marcus Williams · Micah Carroll · Constantin Weisser · Brendan Murphy · Adhyyan Narang · Anca Dragan |
||
Poster
|
Thu 16:30 |
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Sriyash Poddar · Yanming Wan · Hamish Ivison · Abhishek Gupta · Natasha Jaques |
|
Workshop
|
Personalizing Reinforcement Learning from Human Feedback with Variational Preference Learning Sriyash Poddar · Yanming Wan · Hamish Ivison · Abhishek Gupta · Natasha Jaques |