Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

Noise Injection Irons Out Local Minima and Saddle Points

Konstantin Mishchenko · Sebastian Stich


Non-convex optimization problems are ubiquitous in machine learning, especially in Deep Learning. It has been observed in practice, that injecting artificial noise into stochastic gradient descent (SGD) can sometimes improve training and generalization performance.In this work, we formalize noise injection as a smoothing operator and (review and derive) convergence guarantees of SGD under smoothing. We empirically found that Gaussian smoothing works really well for training two-layer neural networks, but these findings to not translate to deeper nets. We would like to use this contribution to stimulate a discussion in the community to further investigate the impact of noise in training machine learning models.

Chat is not available.