Skip to yearly menu bar Skip to main content


Poster

Measuring Progress in Dictionary Learning for Language Model Interpretability with Board Game Models

Adam Karvonen ⋅ Benjamin Wright ⋅ Can Rager ⋅ Rico Angell ⋅ Jannik Brinkmann ⋅ Logan Smith ⋅ Claudio Mayrink Verdun ⋅ David Bau ⋅ Samuel Marks
2024 Poster

Abstract

Video

Chat is not available.