Timezone: »

Non-Gaussian Tensor Programs
Eugene Golikov · Greg Yang

Thu Dec 01 09:00 AM -- 11:00 AM (PST) @ Hall J #535

The Tensor Programs framework has produced a series of powerful results by 1) expressing any deep learning computation of concern as a principled composition of element-wise nonlinearities and matrix multiplication, and 2) inductively reasoning about the program behavior as the sizes of the matrices in the program tend to infinity. For example, this framework helped to show that infinitely wide neural networks exhibit Gaussian process behavior at initialization and evolve like a kernel model during training in the so-called NTK parameterization (Yang, 2019b, 2020a; Yang and Littwin, 2021). Moreover, this framework yielded a novel parameterization, coined μP (Yang and Hu, 2021), that for the first time enabled hyperparameter tuning for enormous networks too expensive to train more than once (Yang et al., 2022). However, this framework has so far been limited to Gaussian initialized weights, while uniform or truncated Gaussian distributions are more prevalent in practice. This work extends Tensor Programs to general non-Gaussian weights, thus recovering all of the above results in all practical settings.

Author Information

Eugene Golikov (École polytechnique fédérale de Lausanne)

MSc in Fluid Mechanics @ Moscow SU MSc in Computer Science @ HSE Doing a PhD in DL theory @ EPFL

Greg Yang (Microsoft Research)

Related Events (a corresponding poster, oral, or spotlight)

More from the Same Authors