Skip to yearly menu bar Skip to main content


Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Frederik Kunstner · Robin Yadav · Alan Milligan · Mark Schmidt · Alberto Bietti

Abstract

Chat is not available.