Skip to yearly menu bar Skip to main content

Workshop: Mathematics of Modern Machine Learning (M3L)

Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit

Blake Bordelon · Lorenzo Noci · Mufan Li · Boris Hanin · Cengiz Pehlevan

Abstract: We study residual networks with a residual branch scale of $1/\sqrt{\text{depth}}$ in combination with the $\mu$P parameterization.We provide experiments demonstrating that residual architectures including convolutional ResNets and Vision Transformers trained with this parameterization exhibit transfer of optimal hyperparameters across width and depth on CIFAR-10 and ImageNet. Furthermore, using recent developments in the dynamical mean field theory (DMFT) description of neural network learning dynamics, we show that this parameterization of ResNets admits a well-defined feature learning joint infinite-width and infinite-depth limit and show convergence of finite-size network dynamics towards this limit.

Chat is not available.