Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

Why Do We Need Weight Decay for Overparameterized Deep Networks?

Maksym Andriushchenko · Francesco D'Angelo · Aditya Vardhan Varre · Nicolas Flammarion


Weight decay is a broadly used technique for training state-of-the-art deep networks. Despite its widespread usage, its role remains poorly understood. In this work, we highlight that the role of weight decay in modern deep learning is different from its regularization effect studied in classical learning theory. For overparameterized deep networks, we show how weight decay modifies the optimization dynamics enhancing the ever-present implicit regularization of SGD via loss stabilization.

Chat is not available.