Skip to yearly menu bar Skip to main content


Resisting RL Elicitation of Biosecurity Capabilities: Reasoning Models Exploration Hacking on WMDP

Joschka Braun ⋅ Yeonwoo Jang ⋅ Damon Falck ⋅ Roland S. Zimmermann ⋅ David Lindner ⋅ Scott Emmons

Abstract

Video

Chat is not available.