Timezone: »
Despite the phenomenal success of deep neural networks in a broad range of learning tasks, there is a lack of theory to understand the way they work. In particular, Convolutional Neural Networks (CNNs) are known to perform much better than Fully-Connected Networks (FCNs) on spatially structured data: the architectural structure of CNNs benefits from prior knowledge on the features of the data, for instance their translation invariance. The aim of this work is to understand this fact through the lens of dynamics in the loss landscape.
We introduce a method that maps a CNN to its equivalent FCN (denoted as eFCN). Such an embedding enables the comparison of CNN and FCN training dynamics directly in the FCN space. We use this method to test a new training protocol, which consists in training a CNN, embedding it to FCN space at a certain ``relax time'', then resuming the training in FCN space. We observe that for all relax times, the deviation from the CNN subspace is small, and the final performance reached by the eFCN is higher than that reachable by a standard FCN of same architecture. More surprisingly, for some intermediate relax times, the eFCN outperforms the CNN it stemmed, by combining the prior information of the CNN and the expressivity of the FCN in a complementary way. The practical interest of our protocol is limited by the very large size of the highly sparse eFCN. However, it offers interesting insights into the persistence of architectural bias under stochastic gradient dynamics. It shows the existence of some rare basins in the FCN loss landscape associated with very good generalization. These can only be accessed thanks to the CNN prior, which helps navigate the landscape during the early stages of optimization.
Author Information
Stéphane d'Ascoli (ENS / FAIR)
Currently a joint Ph.D. student between ENS (supervised by Giulio Biroli) and FAIR (supervised by Levent Sagun). Working on theory of deep learning.
Levent Sagun (Facebook AI Research)
Giulio Biroli (ENS)
Joan Bruna (NYU)
More from the Same Authors
-
2021 : An Extensible Benchmark Suite for Learning to Simulate Physical Systems »
Karl Otness · Arvi Gjoka · Joan Bruna · Daniele Panozzo · Benjamin Peherstorfer · Teseo Schneider · Denis Zorin -
2021 Spotlight: Offline RL Without Off-Policy Evaluation »
David Brandfonbrener · Will Whitney · Rajesh Ranganath · Joan Bruna -
2021 : Quantile Filtered Imitation Learning »
David Brandfonbrener · Will Whitney · Rajesh Ranganath · Joan Bruna -
2022 : Group Excess Risk Bound of Overparameterized Linear Regression with Constant-Stepsize SGD »
Arjun Subramonian · Levent Sagun · Kai-Wei Chang · Yizhou Sun -
2022 Poster: Exponential Separations in Symmetric Neural Networks »
Aaron Zweig · Joan Bruna -
2022 Poster: When does return-conditioned supervised learning work for offline reinforcement learning? »
David Brandfonbrener · Alberto Bietti · Jacob Buckman · Romain Laroche · Joan Bruna -
2022 Poster: On Non-Linear operators for Geometric Deep Learning »
Grégoire Sergeant-Perthuis · Jakob Maier · Joan Bruna · Edouard Oyallon -
2022 Poster: End-to-end Symbolic Regression with Transformers »
Pierre-alexandre Kamienny · Stéphane d'Ascoli · Guillaume Lample · Francois Charton -
2022 Poster: Learning single-index models with shallow neural networks »
Alberto Bietti · Joan Bruna · Clayton Sanford · Min Jae Song -
2021 Poster: On the Sample Complexity of Learning under Geometric Stability »
Alberto Bietti · Luca Venturi · Joan Bruna -
2021 Poster: On the interplay between data structure and loss function in classification problems »
Stéphane d'Ascoli · Marylou Gabrié · Levent Sagun · Giulio Biroli -
2021 Poster: On the Cryptographic Hardness of Learning Single Periodic Neurons »
Min Jae Song · Ilias Zadik · Joan Bruna -
2021 Poster: Offline RL Without Off-Policy Evaluation »
David Brandfonbrener · Will Whitney · Rajesh Ranganath · Joan Bruna -
2020 Poster: A mean-field analysis of two-player zero-sum games »
Carles Domingo-Enrich · Samy Jelassi · Arthur Mensch · Grant Rotskoff · Joan Bruna -
2020 Poster: Can Graph Neural Networks Count Substructures? »
Zhengdao Chen · Lei Chen · Soledad Villar · Joan Bruna -
2020 Session: Orals & Spotlights Track 26: Graph/Relational/Theory »
Joan Bruna · Cassio de Campos -
2020 Poster: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2020 Poster: An analytic theory of shallow networks dynamics for hinge loss classification »
Franco Pellegrini · Giulio Biroli -
2020 Poster: IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method »
Yossi Arjevani · Joan Bruna · Bugra Can · Mert Gurbuzbalaban · Stefanie Jegelka · Hongzhou Lin -
2020 Spotlight: Triple descent and the two kinds of overfitting: where & why do they appear? »
Stéphane d'Ascoli · Levent Sagun · Giulio Biroli -
2020 Spotlight: IDEAL: Inexact DEcentralized Accelerated Augmented Lagrangian Method »
Yossi Arjevani · Joan Bruna · Bugra Can · Mert Gurbuzbalaban · Stefanie Jegelka · Hongzhou Lin -
2020 Poster: Complex Dynamics in Simple Neural Networks: Understanding Gradient Flow in Phase Retrieval »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Pierfrancesco Urbani · Lenka Zdeborová -
2020 Poster: A Dynamical Central Limit Theorem for Shallow Neural Networks »
Zhengdao Chen · Grant Rotskoff · Joan Bruna · Eric Vanden-Eijnden -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 : Surya Ganguli, Yasaman Bahri, Florent Krzakala moderated by Lenka Zdeborova »
Florent Krzakala · Yasaman Bahri · Surya Ganguli · Lenka Zdeborová · Adji Bousso Dieng · Joan Bruna -
2019 : Poster Spotlight 1 »
David Brandfonbrener · Joan Bruna · Tom Zahavy · Haim Kaplan · Yishay Mansour · Nikos Karampatziakis · John Langford · Paul Mineiro · Donghwan Lee · Niao He -
2019 Workshop: Science meets Engineering of Deep Learning »
Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas -
2019 : Welcoming remarks and introduction »
Levent Sagun · Caglar Gulcehre · Adriana Romero Soriano · Negar Rostamzadeh · Nando de Freitas -
2019 : Opening Remarks »
Reinhard Heckel · Paul Hand · Alex Dimakis · Joan Bruna · Deanna Needell · Richard Baraniuk -
2019 Workshop: Solving inverse problems with deep networks: New architectures, theoretical foundations, and applications »
Reinhard Heckel · Paul Hand · Richard Baraniuk · Joan Bruna · Alex Dimakis · Deanna Needell -
2019 Poster: Gradient Dynamics of Shallow Univariate ReLU Networks »
Francis Williams · Matthew Trager · Daniele Panozzo · Claudio Silva · Denis Zorin · Joan Bruna -
2019 Poster: On the Expressive Power of Deep Polynomial Neural Networks »
Joe Kileel · Matthew Trager · Joan Bruna -
2019 Poster: On the equivalence between graph isomorphism testing and function approximation with GNNs »
Zhengdao Chen · Soledad Villar · Lei Chen · Joan Bruna -
2019 Poster: Stability of Graph Scattering Transforms »
Fernando Gama · Alejandro Ribeiro · Joan Bruna -
2019 Poster: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová -
2019 Spotlight: Who is Afraid of Big Bad Minima? Analysis of gradient-flow in spiked matrix-tensor models »
Stefano Sarao Mannelli · Giulio Biroli · Chiara Cammarota · Florent Krzakala · Lenka Zdeborová -
2018 : Invited Talk 3 »
Joan Bruna -
2018 : Joan Bruna »
Joan Bruna -
2017 Tutorial: Geometric Deep Learning on Graphs and Manifolds »
Michael Bronstein · Joan Bruna · arthur szlam · Xavier Bresson · Yann LeCun -
2014 Poster: Exploiting Linear Structure Within Convolutional Networks for Efficient Evaluation »
Emily Denton · Wojciech Zaremba · Joan Bruna · Yann LeCun · Rob Fergus