Skip to yearly menu bar Skip to main content


AI Sandbagging: Language Models can Selectively Underperform on Evaluations

Teun van der Weij · Felix Hofstätter · Oliver Jaffe · Samuel Brown · Francis Ward

Abstract

Chat is not available.