Timezone: »
Neural networks trained with stochastic gradient descent (SGD) starting from different random initialisations typically find functionally very similar solutions, raising the question of whether there are meaningful differences between different SGD solutions. Entezari et al. recently conjectured that despite different initialisations, the solutions found by SGD lie in the same loss valley after taking into account the permutation invariance of neural networks. Concretely, they hypothesise that any two solutions found by SGD can be permuted such that the linear interpolation between their parameters forms a path without significant increases in loss. Here, we use a simple but powerful algorithm to find such permutations that allows us to obtain direct empirical evidence that the hypothesis is true in fully connected networks. Strikingly, we find that two networks already live in the same loss valley at the time of initialisation and averaging their random, but suitably permuted initialisation performs significantly above chance. In contrast, for convolutional architectures, our evidence suggests that the hypothesis does not hold. Especially in a large learning rate regime, SGD seems to discover diverse modes.
Author Information
Frederik Benzing (ETH Zurich)
Simon Schug (ETH Zürich)
Robert Meier (Department of Computer Science, ETHZ - ETH Zurich)
Johannes von Oswald (ETH Zurich)
Yassir Akram (ETH Zürich)
Nicolas Zucchet (ETH Zürich)
Laurence Aitchison (University of Cambridge)
Angelika Steger (ETH Zurich)
More from the Same Authors
-
2021 : Fast, Exact Subsampled Natural Gradients and First-Order KFAC »
Frederik Benzing -
2021 : Fast, Exact Subsampled Natural Gradients and First-Order KFAC »
Frederik Benzing -
2022 : Poster Session 1 »
Andrew Lowy · Thomas Bonnier · Yiling Xie · Guy Kornowski · Simon Schug · Seungyub Han · Nicolas Loizou · xinwei zhang · Laurent Condat · Tabea E. Röber · Si Yi Meng · Marco Mondelli · Runlong Zhou · Eshaan Nichani · Adrian Goldwaser · Rudrajit Das · Kayhan Behdin · Atish Agarwala · Mukul Gagrani · Gary Cheng · Tian Li · Haoran Sun · Hossein Taheri · Allen Liu · Siqi Zhang · Dmitrii Avdiukhin · Bradley Brown · Miaolan Xie · Junhyung Lyle Kim · Sharan Vaswani · Xinmeng Huang · Ganesh Ramachandra Kini · Angela Yuan · Weiqiang Zheng · Jiajin Li -
2022 Poster: A contrastive rule for meta-learning »
Nicolas Zucchet · Simon Schug · Johannes von Oswald · Dominic Zhao · João Sacramento -
2022 Poster: The least-control principle for local learning at equilibrium »
Alexander Meulemans · Nicolas Zucchet · Seijin Kobayashi · Johannes von Oswald · João Sacramento -
2022 Poster: Disentangling the Predictive Variance of Deep Ensembles through the Neural Tangent Kernel »
Seijin Kobayashi · Pau Vilimelis Aceituno · Johannes von Oswald -
2022 Poster: Open-Ended Reinforcement Learning with Neural Reward Functions »
Robert Meier · Asier Mujika -
2021 : Poster Session 1 (gather.town) »
Hamed Jalali · Robert Hönig · Maximus Mutschler · Manuel Madeira · Abdurakhmon Sadiev · Egor Shulgin · Alasdair Paren · Pascal Esser · Simon Roburin · Julius Kunze · Agnieszka Słowik · Frederik Benzing · Futong Liu · Hongyi Li · Ryotaro Mitsuboshi · Grigory Malinovsky · Jayadev Naram · Zhize Li · Igor Sokolov · Sharan Vaswani -
2021 : Contributed Talks in Session 1 (Zoom) »
Sebastian Stich · Futong Liu · Abdurakhmon Sadiev · Frederik Benzing · Simon Roburin -
2021 Poster: Posterior Meta-Replay for Continual Learning »
Christian Henning · Maria Cervera · Francesco D'Angelo · Johannes von Oswald · Regina Traber · Benjamin Ehret · Seijin Kobayashi · Benjamin F. Grewe · João Sacramento -
2021 Poster: Learning where to learn: Gradient sparsity in meta and continual learning »
Johannes von Oswald · Dominic Zhao · Seijin Kobayashi · Simon Schug · Massimo Caccia · Nicolas Zucchet · João Sacramento -
2021 Poster: A variational approximate posterior for the deep Wishart process »
Sebastian Ober · Laurence Aitchison -
2020 Poster: Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods »
Laurence Aitchison -
2019 : Poster and Coffee Break 1 »
Aaron Sidford · Aditya Mahajan · Alejandro Ribeiro · Alex Lewandowski · Ali H Sayed · Ambuj Tewari · Angelika Steger · Anima Anandkumar · Asier Mujika · Hilbert J Kappen · Bolei Zhou · Byron Boots · Chelsea Finn · Chen-Yu Wei · Chi Jin · Ching-An Cheng · Christina Yu · Clement Gehring · Craig Boutilier · Dahua Lin · Daniel McNamee · Daniel Russo · David Brandfonbrener · Denny Zhou · Devesh Jha · Diego Romeres · Doina Precup · Dominik Thalmeier · Eduard Gorbunov · Elad Hazan · Elena Smirnova · Elvis Dohmatob · Emma Brunskill · Enrique Munoz de Cote · Ethan Waldie · Florian Meier · Florian Schaefer · Ge Liu · Gergely Neu · Haim Kaplan · Hao Sun · Hengshuai Yao · Jalaj Bhandari · James A Preiss · Jayakumar Subramanian · Jiajin Li · Jieping Ye · Jimmy Smith · Joan Bas Serrano · Joan Bruna · John Langford · Jonathan Lee · Jose A. Arjona-Medina · Kaiqing Zhang · Karan Singh · Yuping Luo · Zafarali Ahmed · Zaiwei Chen · Zhaoran Wang · Zhizhong Li · Zhuoran Yang · Ziping Xu · Ziyang Tang · Yi Mao · David Brandfonbrener · Shirli Di-Castro · Riashat Islam · Zuyue Fu · Abhishek Naik · Saurabh Kumar · Benjamin Petit · Angeliki Kamoutsi · Simone Totaro · Arvind Raghunathan · Rui Wu · Donghwan Lee · Dongsheng Ding · Alec Koppel · Hao Sun · Christian Tjandraatmadja · Mahdi Karami · Jincheng Mei · Chenjun Xiao · Junfeng Wen · Zichen Zhang · Ross Goroshin · Mohammad Pezeshki · Jiaqi Zhai · Philip Amortila · Shuo Huang · Mariya Vasileva · El houcine Bergou · Adel Ahmadyan · Haoran Sun · Sheng Zhang · Lukas Gruber · Yuanhao Wang · Tetiana Parshakova -
2019 Poster: Tensor Monte Carlo: Particle Methods for the GPU era »
Laurence Aitchison -
2018 Poster: Approximating Real-Time Recurrent Learning with Random Kronecker Factors »
Asier Mujika · Florian Meier · Angelika Steger -
2017 Oral: Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit »
Laurence Aitchison · Lloyd Russell · Adam Packer · Jinyao Yan · Philippe Castonguay · Michael Hausser · Srinivas C Turaga -
2017 Poster: Model-based Bayesian inference of neural activity and connectivity from all-optical interrogation of a neural circuit »
Laurence Aitchison · Lloyd Russell · Adam Packer · Jinyao Yan · Philippe Castonguay · Michael Hausser · Srinivas C Turaga -
2017 Poster: Fast-Slow Recurrent Neural Networks »
Asier Mujika · Florian Meier · Angelika Steger -
2014 Poster: Fast Sampling-Based Inference in Balanced Neuronal Networks »
Guillaume Hennequin · Laurence Aitchison · Mate Lengyel