NeurIPS Poster Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Poster

Bottleneck Structure in Learned Features: Low-Dimension vs Regularity Tradeoff

Arthur Jacot

Great Hall & Hall B1+B2 (level 1) #1727

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Abstract: Previous work has shown that DNNs withlarge depth

L

$L$ and

L_{2}

$L_{2}$ -regularization are biased towards learninglow-dimensional representations of the inputs, which can be interpretedas minimizing a notion of rank

R^{(0)} (f)

$R^{(0)}(f)$ of the learned function

f

$f$ , conjectured to be the Bottleneck rank. We compute finite depthcorrections to this result, revealing a measure

R^{(1)}

$R^{(1)}$ of regularitywhich bounds the pseudo-determinant of the Jacobian

‖ J f (x) ‖_\+

$\left\|Jf(x)\right\|\_\+$ and is subadditive under composition and addition. This formalizesa balance between learning low-dimensional representations and minimizingcomplexity/irregularity in the feature maps, allowing the networkto learn the `right' inner dimension. Finally, we prove the conjecturedbottleneck structure in the learned features as

L \to \infty

$L\to\infty$ : forlarge depths, almost all hidden representations are approximately

R^{(0)} (f)

$R^{(0)}(f)$ -dimensional, and almost all weight matrices

W_{ℓ}

$W_{\ell}$ have

R^{(0)} (f)

$R^{(0)}(f)$ singular values close to 1 while the others are

O (L^{- \frac{1}{2}})

$O(L^{-\frac{1}{2}})$ . Interestingly, the use of large learning ratesis required to guarantee an order

O (L)

$O(L)$ NTK which in turns guaranteesinfinite depth convergence of the representations of almost all layers.

Chat is not available.