Timezone: »
Poster
Revisiting Model Stitching to Compare Neural Representations
Yamini Bansal · Preetum Nakkiran · Boaz Barak
We revisit and extend model stitching (Lenc & Vedaldi 2015) as a methodology to study the internal representations of neural networks. Given two trained and frozen models $A$ and $B$, we consider a "stitched model" formed by connecting the bottom-layers of $A$ to the top-layers of $B$, with a simple trainable layer between them. We argue that model stitching is a powerful and perhaps under-appreciated tool, which reveals aspects of representations that measures such as centered kernel alignment (CKA) cannot. Through extensive experiments, we use model stitching to obtain quantitative verifications for intuitive statements such as "good networks learn similar representations", by demonstrating that good networks of the same architecture, but trained in very different ways (eg: supervised vs. self-supervised learning), can be stitched to each other without drop in performance. We also give evidence for the intuition that "more is better" by showing that representations learnt with (1) more data, (2) bigger width, or (3) more training time can be "plugged in" to weaker models to improve performance. Finally, our experiments reveal a new structural property of SGD which we call "stitching connectivity", akin to mode-connectivity: typical minima reached by SGD are all "stitching-connected" to each other.
Author Information
Yamini Bansal (Harvard University)
Preetum Nakkiran (Harvard)
Boaz Barak (Harvard University)
More from the Same Authors
-
2022 : APE: Aligning Pretrained Encoders to Quickly Learn Aligned Multimodal Representations »
Elan Rosenfeld · Preetum Nakkiran · Hadi Pouransari · Oncel Tuzel · Fartash Faghri -
2022 : Deconstructing Distributions: A Pointwise Framework of Learning »
Gal Kaplun · Nikhil Ghosh · Saurabh Garg · Boaz Barak · Preetum Nakkiran -
2023 Poster: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Sampo Pyysalo · Thomas Wolf · Colin Raffel -
2023 Oral: Scaling Data-Constrained Language Models »
Niklas Muennighoff · Alexander Rush · Boaz Barak · Teven Le Scao · Nouamane Tazi · Aleksandra Piktus · Sampo Pyysalo · Thomas Wolf · Colin Raffel -
2022 Poster: Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit »
Boaz Barak · Benjamin Edelman · Surbhi Goel · Sham Kakade · Eran Malach · Cyril Zhang -
2020 : Contributed talks in Session 3 (Zoom) »
Mark Schmidt · Zhan Gao · Wenjie Li · Preetum Nakkiran · Denny Wu · Chengrun Yang -
2020 : Contributed Video: Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems, Preetum Nakkiran »
Preetum Nakkiran -
2020 : Poster Session 2 (gather.town) »
Sharan Vaswani · Nicolas Loizou · Wenjie Li · Preetum Nakkiran · Zhan Gao · Sina Baghal · Jingfeng Wu · Roozbeh Yousefzadeh · Jinyi Wang · Jing Wang · Cong Xie · Anastasia Borovykh · Stanislaw Jastrzebski · Soham Dan · Yiliang Zhang · Mark Tuddenham · Sarath Pattathil · Ievgen Redko · Jeremy Cohen · Yasaman Esfandiari · Zhanhong Jiang · Mostafa ElAraby · Chulhee Yun · Michael Psenka · Robert Gower · Xiaoyu Wang -
2019 Poster: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Spotlight: SGD on Neural Networks Learns Functions of Increasing Complexity »
Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang -
2019 Poster: (Nearly) Efficient Algorithms for the Graph Matching Problem on Correlated Random Graphs »
Boaz Barak · Chi-Ning Chou · Zhixian Lei · Tselil Schramm · Yueqi Sheng