Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

Understanding the Role of Noisy Statistics in the Regularization Effect of Batch Normalization

Atli Kosson · Dongyang Fan · Martin Jaggi


Normalization layers have been shown to benefit the training stability and generalization of deep neural networks in various ways. For Batch Normalization (BN), the noisy statistics have been observed to have a regularization effect that depends on the batch size. Following this observation, Hoffer et. al. proposed Ghost Batch Normalization (GBN), where BN is explicitly performed independently on smaller sub-batches, resulting in improved generalization in many settings. In this study we analyze and isolate the effect of the noisy statistics by comparing BN and GBN, introducing a noise injection method. We then quantitatively assess the effects of the noise, juxtaposing it with other regularizers like dropout and examining its potential role in the generalization disparities between batch normalization and its alternatives, including layer normalization and normalization-free methods.

Chat is not available.