Workshop: OPT2020: Optimization for Machine Learning

Invited speaker: SGD without replacement: optimal rate analysis and more, Suvrit Sra

Suvrit Sra


Stochastic gradient descent (SGD) is the workhorse of machine learning. There are two fundamental versions of SGD: (i) those that pick stochastic gradients with replacement, and (ii) those that pick without replacement. Ironically, version (ii) is what is used in practice (across most ML toolkits), while version (i) is what almost all published work analyzes. This mismatch is well-known. It arises because analyzing SGD without replacement involves biased gradients and must cope with lack of independence between the stochastic gradients used. In this talk, I will present recent progress on analyzing without replacement SGD, the bulk of which will focus on minimax optimal convergence rates. The rates are obtained without assuming componentwise convexity. I will mention further refinements of the results assuming this additional convexity, which remove drawbacks common to previous works (such as large number of epochs required)