Skip to yearly menu bar Skip to main content

Workshop: OPT 2023: Optimization for Machine Learning

SGD batch saturation for training wide neural networks

Chaoyue Liu · Dmitriy Drusvyatskiy · Misha Belkin · Damek Davis · Yian Ma


The performance of the mini-batch stochastic gradient method strongly depends on the batch-size that is used. In the classical convex setting with interpolation, prior work showed that increasing the batch size linearly increases the convergence speed, but only up to a point; when the batch size is larger than a certain threshold (the critical batchsize), further increasing the batch size only leads to negligible improvement. The goal of this work is to investigate the relationship between the batchsize and convergence speed for a broader class of nonconvex problems. Building on recent improved convergence guarantees for SGD, we prove that a similar linear scaling and batch-size saturation phenomenon occurs for training sufficiently wide neural networks. We conduct a number of numerical experiments on benchmark datasets, which corroborate our findings.

Chat is not available.