Timezone: »
Modern neural network performance typically improves as model size increases. A recent line of research on the Neural Tangent Kernel (NTK) of over-parameterized networks indicates that the improvement with size increase is a product of a better conditioned loss landscape. In this work, we investigate a form of over-parameterization achieved through ensembling, where we define collegial ensembles (CE) as the aggregation of multiple independent models with identical architectures, trained as a single model. We show that the optimization dynamics of CE simplify dramatically when the number of models in the ensemble is large, resembling the dynamics of wide models, yet scale much more favorably. We use recent theoretical results on the finite width corrections of the NTK to perform efficient architecture search in a space of finite width CE that aims to either minimize capacity, or maximize trainability under a set of constraints. The resulting ensembles can be efficiently implemented in practical architectures using group convolutions and block diagonal layers. Finally, we show how our framework can be used to analytically derive optimal group convolution modules originally found using expensive grid searches, without having to train a single model.
Author Information
Etai Littwin (Apple)
Ben Myara (apple)
Sima Sabah (Apple)
Joshua Susskind (Apple Inc.)
I was an undergraduate in Cognitive Science at UCSD from 1995-2003 (with some breaks). Then I earned a PhD from UofT in machine learning and cognitive neuroscience, with Dr. Geoff Hinton and Dr. Adam Anderson. Following grad school I moved to UCSD for a post-doctoral position. Before coming to Apple I co-founded Emotient in 2012 and led the deep learning effort for facial expression and demographics recognition. Since joining Apple, I led the Face ID neural network team responsible for face recognition, and then started a machine learning research group within the hardware organization focused on fundamental ML technology.
Shuangfei Zhai (Apple)
Oren Golan (apple)
Related Events (a corresponding poster, oral, or spotlight)
-
2020 Spotlight: Collegial Ensembles »
Fri. Dec 11th 03:30 -- 03:40 AM Room Orals & Spotlights: Deep Learning
More from the Same Authors
-
2021 : Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models »
Nitish Srivastava · Walter Talbott · Shuangfei Zhai · Joshua Susskind -
2022 : The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the \emph{Grokking Phenomenon} »
Vimal Thilak · Etai Littwin · Shuangfei Zhai · Omid Saremi · Roni Paiss · Joshua Susskind -
2023 Poster: Transformers learn through gradual rank increase »
Emmanuel Abbe · Samy Bengio · Enric Boix-Adsera · Etai Littwin · Joshua Susskind -
2023 Poster: PLANNER: Generating Diversified Paragraph via Latent Language Diffusion Model »
Yizhe Zhang · Jiatao Gu · Zhuofeng Wu · Shuangfei Zhai · Joshua Susskind · Navdeep Jaitly -
2022 Poster: GAUDI: A Neural Architect for Immersive 3D Scene Generation »
Miguel Angel Bautista · Pengsheng Guo · Samira Abnar · Walter Talbott · Alexander Toshev · Zhuoyuan Chen · Laurent Dinh · Shuangfei Zhai · Hanlin Goh · Daniel Ulbricht · Afshin Dehghan · Joshua Susskind -
2019 Poster: Adversarial Fisher Vectors for Unsupervised Representation Learning »
Shuangfei Zhai · Walter Talbott · Carlos Guestrin · Joshua Susskind -
2019 Spotlight: Adversarial Fisher Vectors for Unsupervised Representation Learning »
Shuangfei Zhai · Walter Talbott · Carlos Guestrin · Joshua Susskind