Skip to yearly menu bar Skip to main content


Interpreting and Steering LLMs with Mutual Information-based Explanations on Sparse Autoencoders

Xuansheng Wu ⋅ Jiayi Yuan ⋅ Wenlin Yao ⋅ Xiaoming Zhai ⋅ Ninghao Liu

Abstract

Chat is not available.