Timezone: »

Escaping Local Minima With Stochastic Noise
Harshvardhan Harshvardhan · Sebastian Stich

With the recent advancements in Deep Learning, non-convex problems have received wide spread attention. However, while stochastic gradient descent (SGD) can often successfully optimize these non-convex models in practice, the theoretical analysis---mapping SGD to Gradient Descent (GD) in expectation---cannot explain global convergence, as GD gets trapped in local minima and stationary points. We introduce a novel proof framework to analyze stochastic methods on a class of structured non-convex functions. We prove that SGD converges linearly to approximate global minima on non-convex functions that are close' to a convex-like (strongly convex or P\L) function when its iterates are perturbed with stochastic noise of appropriate strength (smoothing). Our analysis applies to a strictly larger class of functions than studied in prior work. We demonstrate that special cases of smoothing are equivalent to vanilla SGD in expectation, which explains why SGD can escape local minima in practice.

#### Author Information

##### Sebastian Stich (CISPA)

Dr. [Sebastian U. Stich](https://sstich.ch/) is a faculty at the CISPA Helmholtz Center for Information Security. Research interests: - *methods for machine learning and statistics*—at the interface of theory and practice - *collaborative learning* (distributed, federated and decentralized methods) - *optimization for machine learning* (adaptive stochastic methods and generalization performance)