Skip to yearly menu bar Skip to main content


Diverse and Effective Red Teaming with Auto-generated Rewards and Multi-step Reinforcement Learning

Alex Beutel · Kai Xiao · Johannes Heidecke · Lilian Weng

Abstract

Chat is not available.