Timezone: »
Recent advances in learning aligned multimodal representations have been primarily driven by training large neural networks on massive, noisy paired-modality datasets. In this work, we ask whether it is possible to achieve similar results with substantially less training time and data. We achieve this by taking advantage of existing pretrained unimodal encoders and careful curation of alignment data relevant to the downstream task of interest. We study a natural approach to aligning existing encoders via small auxiliary functions, and we find that this method is competitive with (or outperforms) state of the art in many settings while being less prone to overfitting, less costly to train, and more robust to distribution shift. With a carefully chosen alignment distribution, our method surpasses prior state of the art for ImageNet zero-shot classification on public data while using two orders of magnitude less time and data and training 77% fewer parameters.
Author Information
Elan Rosenfeld (Carnegie Mellon University)
Preetum Nakkiran (Harvard)
Hadi Pouransari (Apple Inc)
Oncel Tuzel (Apple)
Fartash Faghri (University of Toronto)
More from the Same Authors
-
2022 : Deconstructing Distributions: A Pointwise Framework of Learning »
Gal Kaplun · Nikhil Ghosh · Saurabh Garg · Boaz Barak · Preetum Nakkiran -
2022 : Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization »
Elan Rosenfeld · Pradeep Ravikumar · Andrej Risteski -
2022 : Domain-Adjusted Regression or: ERM May Already Learn Features Sufficient for Out-of-Distribution Generalization »
Elan Rosenfeld · Pradeep Ravikumar · Andrej Risteski -
2022 Poster: Iterative Feature Matching: Toward Provable Domain Generalization with Logarithmic Environments »
Yining Chen · Elan Rosenfeld · Mark Sellke · Tengyu Ma · Andrej Risteski -
2021 Poster: Revisiting Model Stitching to Compare Neural Representations »
Yamini Bansal · Preetum Nakkiran · Boaz Barak -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Contributed Video: Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems, Preetum Nakkiran »
Preetum Nakkiran -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2019 Poster: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Spotlight: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Poster: Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum »
Shreyas Saxena · Oncel Tuzel · Dennis DeCoste -
2016 Poster: Coupled Generative Adversarial Networks »
Ming-Yu Liu · Oncel Tuzel -
2014 Poster: Recursive Context Propagation Network for Semantic Scene Labeling »
Abhishek Sharma · Oncel Tuzel · Ming-Yu Liu