Timezone: »

Perturbing BatchNorm and Only BatchNorm Benefits Sharpness-Aware Minimization
Maximilian Mueller · Matthias Hein
Event URL: https://openreview.net/forum?id=yL_iq-Q-ORS »

We investigate the connection between two popular methods commonly used in training deep neural networks: Sharpness-Aware Minimization (SAM) and Batch Normalization. We find that perturbing \textit{only} the affine BatchNorm parameters in the adversarial step of SAM benefits the generalization performance, while excluding them can decrease the performance strongly. We confirm our results across several models and SAM-variants on CIFAR-10 and CIFAR-100 and show preliminary results for ImageNet. Our results provide a practical tweak for training deep networks, but also cast doubt on the commonly accepted explanation of SAM minimizing a sharpness quantity responsible for generalization.

Author Information

Maximilian Mueller (Universtity of Tübingen)
Matthias Hein (University of Tübingen)

More from the Same Authors