`

Timezone: »

Poster
Analytic Characterization of the Hessian in Shallow ReLU Models: A Tale of Symmetry
Yossi Arjevani · Michael Field

Wed Dec 09 09:00 AM -- 11:00 AM (PST) @ Poster Session 3 #1111
We consider the optimization problem associated with fitting two-layers ReLU networks with respect to the squared loss, where labels are generated by a target network. We leverage the rich symmetry structure to analytically characterize the Hessian at various families of spurious minima in the natural regime where the number of inputs $d$ and the number of hidden neurons $k$ is finite. In particular, we prove that for $d\ge k$ standard Gaussian inputs: (a) of the $dk$ eigenvalues of the Hessian, $dk - O(d)$ concentrate near zero, (b) $\Omega(d)$ of the eigenvalues grow linearly with $k$. Although this phenomenon of extremely skewed spectrum has been observed many times before, to our knowledge, this is the first time it has been established {rigorously}. Our analytic approach uses techniques, new to the field, from symmetry breaking and representation theory, and carries important implications for our ability to argue about statistical generalization through local curvature.