Skip to yearly menu bar Skip to main content


Poster

Tight Nonparametric Convergence Rates for Stochastic Gradient Descent under the Noiseless Linear Model

Raphaël Berthier · Francis Bach · Pierre Gaillard

Poster Session 5 #1390

Keywords: [ Optimization for Deep Networks ] [ Deep Learning ] [ Non-Convex Optimization ] [ Optimization ]


Abstract: In the context of statistical supervised learning, the noiseless linear model assumes that there exists a deterministic linear relation Y=θ,Φ(U) between the random output Y and the random feature vector Φ(U), a potentially non-linear transformation of the inputs~U. We analyze the convergence of single-pass, fixed step-size stochastic gradient descent on the least-square risk under this model. The convergence of the iterates to the optimum θ and the decay of the generalization error follow polynomial convergence rates with exponents that both depend on the regularities of the optimum θ and of the feature vectors Φ(U). We interpret our result in the reproducing kernel Hilbert space framework. As a special case, we analyze an online algorithm for estimating a real function on the unit hypercube from the noiseless observation of its value at randomly sampled points; the convergence depends on the Sobolev smoothness of the function and of a chosen kernel. Finally, we apply our analysis beyond the supervised learning setting to obtain convergence rates for the averaging process (a.k.a. gossip algorithm) on a graph depending on its spectral dimension.

Chat is not available.