Skip to yearly menu bar Skip to main content


Poster
in
Workshop: Safe Generative AI

Large Language Model Benchmarks Do Not Test Reliability

Joshua Vendrow ⋅ Edward Vendrow ⋅ Sara Beery ⋅ Aleksander Madry

Abstract

Chat is not available.