Timezone: »
Poster
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu · Qingcan Wang · Chao Ma
Thu Dec 12 05:00 PM -- 07:00 PM (PST) @ East Exhibition Hall B + C #201
We analyze the global convergence of gradient descent for deep linear residual
networks by proposing a new initialization: zero-asymmetric (ZAS)
initialization. It is motivated by avoiding stable manifolds of saddle points.
We prove that under the ZAS initialization, for an arbitrary target matrix,
gradient descent converges to an $\varepsilon$-optimal point in $O\left( L^3
\log(1/\varepsilon) \right)$ iterations, which scales polynomially with the
network depth $L$. Our result and the $\exp(\Omega(L))$ convergence time for the
standard initialization (Xavier or near-identity)
\cite{shamir2018exponential} together demonstrate the importance of the
residual structure and the initialization in the optimization for deep linear
neural networks, especially when $L$ is large.
Author Information
Lei Wu (Princeton University)
Qingcan Wang (Program in Applied and Computational Mathematics, Princeton University)
Chao Ma (Princeton University)
More from the Same Authors
-
2021 Spotlight: On Linear Stability of SGD and Input-Smoothness of Neural Networks »
Chao Ma · Lexing Ying -
2021 Poster: On Linear Stability of SGD and Input-Smoothness of Neural Networks »
Chao Ma · Lexing Ying -
2018 Poster: How SGD Selects the Global Minima in Over-parameterized Learning: A Dynamical Stability Perspective »
Lei Wu · Chao Ma · Weinan E