Skip to yearly menu bar Skip to main content


Heavy-Tailed Class Imbalance and Why Adam Outperforms Gradient Descent on Language Models

Frederik Kunstner ⋅ Robin Yadav ⋅ Alan Milligan ⋅ Mark Schmidt ⋅ Alberto Bietti

Abstract

Chat is not available.