Timezone: »
Scaling laws have been recently employed to derive compute-optimal model size (number of parameters) for a given compute duration. We advance and refine such methods to infer compute-optimal model shapes, such as width and depth, and successfully implement this in vision transformers. Our shape-optimized vision transformer, SoViT, achieves results competitive with models that exceed twice its size, despite being pre-trained with an equivalent amount of compute. For example, SoViT-400m/14 achieves 90.3% fine-tuning accuracy on ILSRCV2012, surpassing the much larger ViT-g/14 and approaching ViT-G/14 under identical settings, with also less than half the inference cost. We conduct a thorough evaluation across multiple tasks, such as image classification, captioning, VQA and zero-shot transfer, demonstrating the effectiveness of our model across a broad range of domains and identifying limitations. Overall, our findings challenge the prevailing approach of blindly scaling up vision models and pave a path for a more informed scaling.
Author Information
Ibrahim Alabdulmohsin (Google Deepmind)
Xiaohua Zhai (Google Brain)
Alexander Kolesnikov (Google Research, Brain team)
Lucas Beyer (Google Brain Zürich)
More from the Same Authors
-
2021 : A Unified Few-Shot Classification Benchmark to Compare Transfer and Meta Learning Approaches »
Vincent Dumoulin · Neil Houlsby · Utku Evci · Xiaohua Zhai · Ross Goroshin · Sylvain Gelly · Hugo Larochelle -
2023 Poster: Image Captioners Are Scalable Vision Learners Too »
Michael Tschannen · Manoj Kumar · Andreas Steiner · Xiaohua Zhai · Neil Houlsby · Lucas Beyer -
2023 Oral: Image Captioners Are Scalable Vision Learners Too »
Michael Tschannen · Manoj Kumar · Andreas Steiner · Xiaohua Zhai · Neil Houlsby · Lucas Beyer -
2023 Poster: Patch n’ Pack: NaViT, a Vision Transformer for any Aspect Ratio and Resolution »
Mostafa Dehghani · Basil Mustafa · Josip Djolonga · Jonathan Heek · Matthias Minderer · Mathilde Caron · Andreas Steiner · Joan Puigcerver · Robert Geirhos · Ibrahim Alabdulmohsin · Avital Oliver · Piotr Padlewski · Alexey Gritsenko · Mario Lucic · Neil Houlsby -
2023 Poster: Three Towers: Flexible Contrastive Learning with Pretrained Image Models »
Jannik Kossen · Mark Collier · Basil Mustafa · Xiao Wang · Xiaohua Zhai · Lucas Beyer · Andreas Steiner · Jesse Berent · Rodolphe Jenatton · Effrosyni Kokiopoulou -
2022 Poster: Diagnosing failures of fairness transfer across distribution shift in real-world medical settings »
Jessica Schrouff · Natalie Harris · Sanmi Koyejo · Ibrahim Alabdulmohsin · Eva Schnider · Krista Opsahl-Ong · Alexander Brown · Subhrajit Roy · Diana Mincu · Christina Chen · Awa Dieng · Yuan Liu · Vivek Natarajan · Alan Karthikesalingam · Katherine Heller · Silvia Chiappa · Alexander D'Amour -
2022 Poster: UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes »
Alexander Kolesnikov · André Susano Pinto · Lucas Beyer · Xiaohua Zhai · Jeremiah Harmsen · Neil Houlsby -
2022 Poster: A Reduction to Binary Approach for Debiasing Multiclass Datasets »
Ibrahim Alabdulmohsin · Jessica Schrouff · Sanmi Koyejo -
2022 Poster: Fair Wrapping for Black-box Predictions »
Alexander Soen · Ibrahim Alabdulmohsin · Sanmi Koyejo · Yishay Mansour · Nyalleng Moorosi · Richard Nock · Ke Sun · Lexing Xie -
2022 Poster: Revisiting Neural Scaling Laws in Language and Vision »
Ibrahim Alabdulmohsin · Behnam Neyshabur · Xiaohua Zhai -
2021 : Live panel: Did we solve ImageNet? »
Shibani Santurkar · Alexander Kolesnikov · Becca Roelofs -
2021 : Are we done with ImageNet? »
Alexander Kolesnikov -
2021 Workshop: ImageNet: Past, Present, and Future »
Zeynep Akata · Lucas Beyer · Sanghyuk Chun · A. Sophia Koepke · Diane Larlus · Seong Joon Oh · Rafael Rezende · Sangdoo Yun · Xiaohua Zhai -
2021 Poster: MLP-Mixer: An all-MLP Architecture for Vision »
Ilya Tolstikhin · Neil Houlsby · Alexander Kolesnikov · Lucas Beyer · Xiaohua Zhai · Thomas Unterthiner · Jessica Yung · Andreas Steiner · Daniel Keysers · Jakob Uszkoreit · Mario Lucic · Alexey Dosovitskiy -
2021 Poster: Revisiting the Calibration of Modern Neural Networks »
Matthias Minderer · Josip Djolonga · Rob Romijnders · Frances Hubis · Xiaohua Zhai · Neil Houlsby · Dustin Tran · Mario Lucic