Timezone: »

An Improved Analysis of Training Over-parameterized Deep Neural Networks
Difan Zou · Quanquan Gu

Tue Dec 10 10:45 AM -- 12:45 PM (PST) @ East Exhibition Hall B + C #135
A recent line of research has shown that gradient-based algorithms with random initialization can converge to the global minima of the training loss for over-parameterized (i.e., sufficiently wide) deep neural networks. However, the condition on the width of the neural network to ensure the global convergence is very stringent, which is often a high-degree polynomial in the training sample size $n$ (e.g., $O(n^{24})$). In this paper, we provide an improved analysis of the global convergence of (stochastic) gradient descent for training deep neural networks, which only requires a milder over-parameterization condition than previous work in terms of the training sample size and other problem-dependent parameters. The main technical contributions of our analysis include (a) a tighter gradient lower bound that leads to a faster convergence of the algorithm, and (b) a sharper characterization of the trajectory length of the algorithm. By specializing our result to two-layer (i.e., one-hidden-layer) neural networks, it also provides a milder over-parameterization condition than the best-known result in prior work.

Author Information

Difan Zou (University of California, Los Angeles)
Quanquan Gu (UCLA)

More from the Same Authors