Skip to yearly menu bar Skip to main content


Linear attention is (maybe) all you need (to understand transformer optimization)

Kwangjun Ahn ⋅ Xiang Cheng ⋅ Minhak Song ⋅ Chulhee Yun ⋅ Ali Jadbabaie ⋅ Suvrit Sra

Abstract

Chat is not available.