Timezone: »

Invited speaker: SGD without replacement: optimal rate analysis and more, Suvrit Sra
Suvrit Sra

Fri Dec 11 09:00 AM -- 09:20 AM (PST) @

Stochastic gradient descent (SGD) is the workhorse of machine learning. There are two fundamental versions of SGD: (i) those that pick stochastic gradients with replacement, and (ii) those that pick without replacement. Ironically, version (ii) is what is used in practice (across most ML toolkits), while version (i) is what almost all published work analyzes. This mismatch is well-known. It arises because analyzing SGD without replacement involves biased gradients and must cope with lack of independence between the stochastic gradients used. In this talk, I will present recent progress on analyzing without replacement SGD, the bulk of which will focus on minimax optimal convergence rates. The rates are obtained without assuming componentwise convexity. I will mention further refinements of the results assuming this additional convexity, which remove drawbacks common to previous works (such as large number of epochs required)

Author Information

Suvrit Sra (MIT)

Suvrit Sra is a faculty member within the EECS department at MIT, where he is also a core faculty member of IDSS, LIDS, MIT-ML Group, as well as the statistics and data science center. His research spans topics in optimization, matrix theory, differential geometry, and probability theory, which he connects with machine learning --- a key focus of his research is on the theme "Optimization for Machine Learning” (http://opt-ml.org)

More from the Same Authors