Skip to yearly menu bar Skip to main content


Poster

Reducing Transformer Key-Value Cache Size with Cross-Layer Attention

William Brandon ⋅ Mayank Mishra ⋅ Aniruddha Nrusimha ⋅ Rameswar Panda ⋅ Jonathan Ragan-Kelley
2024 Poster

Abstract

Video

Chat is not available.