Skip to yearly menu bar Skip to main content


Muon: Training and Trade-offs with Latent Attention and MoE

Sushant Mehta · Raj Dandekar · Rajat Dandekar · Sreedath Panat

Abstract

Chat is not available.