Skip to yearly menu bar Skip to main content


Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

Alex Beutel ⋅ Kai Xiao ⋅ Johannes Heidecke ⋅ Lilian Weng

Abstract

Chat is not available.