Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

Understanding the Role of Optimization in Double Descent

Chris Liu · Jeffrey Flanigan


The phenomenon of model-wise double descent, where the test error peaks and then reduces as the model size increases, is an interesting topic that has attracted the attention of researchers due to the striking observed gap between theory and practice \citep{Belkin2018ReconcilingMM}. While double descent has been observed in various tasks and architectures, the peak of double descent can sometimes be noticeably absent or diminished, even without explicit regularization, such as weight decay and early stopping. In this paper, we investigate this intriguing phenomenon from the perspective of optimization and propose a simple optimization-based explanation for why double descent sometimes occurs weakly or not at all. To the best of our knowledge, we are the first to demonstrate that many disparate factors contributing to model-wise double descent are unified from the viewpoint of optimization: model-wise double descent is observed if and only if the optimizer is able to find a sufficiently low-loss minimum.We conduct a series of controlled experiments on random feature models and two-layer neural networks under various optimization settings demonstrating this optimization-based unified view.Our results suggest the following implication: double descent is unlikely to be a problem for real-world machine learning setups. Additionally, our results help explain the gap between weak double descent peaks in practice and strong peaks observable in carefully designed setups.

Chat is not available.