Skip to yearly menu bar Skip to main content


Poster

Rule Based Rewards for Language Model Safety

Tong Mu ⋅ Alec Helyar ⋅ Johannes Heidecke ⋅ Joshua Achiam ⋅ Andrea Vallone ⋅ Ian Kivlichan ⋅ Molly Lin ⋅ Alex Beutel ⋅ John Schulman ⋅ Lilian Weng
2024 Poster

Abstract

Video

Chat is not available.