Skip to yearly menu bar Skip to main content


( events)   Timezone:  
Tutorial
Mon Dec 05 05:30 AM -- 07:30 AM (PST) @ Rooms 211 + 212
Large-Scale Optimization: Beyond Stochastic Gradient Descent and Convexity
Suvrit Sra · Francis Bach
[ Video

Stochastic optimization lies at the heart of machine learning, and its cornerstone is stochastic gradient descent (SGD), a staple introduced over 60 years ago! Recent years have, however, brought an exciting new development: variance reduction (VR) for stochastic methods. These VR methods excel in settings where more than one pass through the training data is allowed, achieving convergence faster than SGD, in theory as well as practice. These speedups underline the huge surge of interest in VR methods; by now a large body of work has emerged, while new results appear regularly! This tutorial brings to the wider machine learning audience the key principles behind VR methods, by positioning them vis-à-vis SGD. Moreover, the tutorial takes a step beyond convexity and covers research-edge results for non-convex problems too, while outlining key points and as yet open challenges.

Learning Objectives:

– Introduce fast stochastic methods to the wider ML audience to go beyond a 60-year-old algorithm (SGD) – Provide a guiding light through this fast moving area, to unify, and simplify its presentation, outline common pitfalls, and to demystify its capabilities – Raise awareness about open challenges in the area, and thereby spur future research

Target Audience;

– Graduate students (masters as well as PhD stream)

– ML researchers in academia and industry who are not experts in stochastic optimization

– Practitioners who want to widen their repertoire of tools