Skip to yearly menu bar Skip to main content


Poster

Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers

Gavia Gray ⋅ aman tiwari ⋅ Shane Bergsma ⋅ Joel Hestness
2024 Poster

Abstract

Video

Chat is not available.