Skip to yearly menu bar Skip to main content


Poster

Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?

Richard Ren ⋅ Steven Basart ⋅ Adam Khoja ⋅ Alice Gatti ⋅ Long Phan ⋅ Xuwang Yin ⋅ Mantas Mazeika ⋅ Alexander Pan ⋅ Gabriel Mukobi ⋅ Ryan Kim ⋅ Stephen Fitz ⋅ Dan Hendrycks
2024 Poster

Abstract

Video

Chat is not available.