Skip to yearly menu bar Skip to main content


Poster

Compact Proofs of Model Performance via Mechanistic Interpretability

Jason Gross ⋅ Rajashree Agrawal ⋅ Thomas Kwa ⋅ Euan Ong ⋅ Chun Hei Yip ⋅ Alex Gibson ⋅ Soufiane Noubir ⋅ Lawrence Chan
2024 Poster

Abstract

Video

Chat is not available.