Skip to yearly menu bar Skip to main content


JoMA: Demystifying Multilayer Transformers via JOint Dynamics of MLP and Attention

Yuandong Tian ⋅ Yiping Wang ⋅ Zhenyu Zhang ⋅ Beidi Chen ⋅ Simon Du

Abstract

Chat is not available.