Skip to yearly menu bar Skip to main content


Without-Replacement Sampling for Stochastic Gradient Methods

Ohad Shamir

Area 5+6+7+8 #174

Keywords: [ Stochastic Methods ] [ Convex Optimization ] [ Online Learning ] [ Learning Theory ]


Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.

Live content is unavailable. Log in and register to view live content