firstbacksecondback
1 Results
Workshop
|
Reward Model Underspecification in Language Model Alignment Jacob Eisenstein · Jonathan Berant · Chirag Nagpal · Alekh Agarwal · Ahmad Beirami · Alexander D'Amour · Krishnamurthy Dvijotham · Katherine Heller · Stephen Pfohl · Deepak Ramachandran |