Timezone: »
Regularization is typically understood as improving generalization by altering the landscape of local extrema to which the model eventually converges. Deep neural networks (DNNs), however, challenge this view: We show that removing regularization after an initial transient period has little effect on generalization, even if the final loss landscape is the same as if there had been no regularization. In some cases, generalization even improves after interrupting regularization. Conversely, if regularization is applied only after the initial transient, it has no effect on the final solution, whose generalization gap is as bad as if regularization never happened. This suggests that what matters for training deep networks is not just whether or how, but when to regularize. The phenomena we observe are manifest in different datasets (CIFAR-10, CIFAR-100, SVHN, ImageNet), different architectures (ResNet-18, All-CNN), different regularization methods (weight decay, data augmentation, mixup), different learning rate schedules (exponential, piece-wise constant). They collectively suggest that there is a "critical period'' for regularizing deep networks that is decisive of the final performance. More analysis should, therefore, focus on the transient rather than asymptotic behavior of learning.
Author Information
Aditya Sharad Golatkar (UCLA)
Alessandro Achille (AWS)
Stefano Soatto (UCLA)
More from the Same Authors
-
2022 Poster: On Leave-One-Out Conditional Mutual Information For Generalization »
Mohamad Rida Rammal · Alessandro Achille · Aditya Golatkar · Suhas Diggavi · Stefano Soatto -
2020 Workshop: Deep Learning through Information Geometry »
Pratik Chaudhari · Alexander Alemi · Varun Jog · Dhagash Mehta · Frank Nielsen · Stefano Soatto · Greg Ver Steeg -
2020 Poster: Predicting Training Time Without Training »
Luca Zancato · Alessandro Achille · Avinash Ravichandran · Rahul Bhotika · Stefano Soatto -
2020 Poster: Geo-PIFu: Geometry and Pixel Aligned Implicit Functions for Single-view Human Reconstruction »
Tong He · John Collomosse · Hailin Jin · Stefano Soatto -
2020 Poster: Targeted Adversarial Perturbations for Monocular Depth Prediction »
Alex Wong · Safa Cicek · Stefano Soatto -
2019 : Invited Talk: Stefano Soatto and Alessandro Achille »
Stefano Soatto · Alessandro Achille -
2018 Poster: Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies »
Alessandro Achille · Tom Eccles · Loic Matthey · Chris Burgess · Nicholas Watters · Alexander Lerchner · Irina Higgins -
2018 Spotlight: Life-Long Disentangled Representation Learning with Cross-Domain Latent Homologies »
Alessandro Achille · Tom Eccles · Loic Matthey · Chris Burgess · Nicholas Watters · Alexander Lerchner · Irina Higgins -
2017 : Stefano Soatto »
Stefano Soatto -
2012 Poster: Controlled Recognition Bounds for Visual Learning and Exploration »
Vasiliy Karasev · Chiuso Alessandro c/o Dip. I Informazione · Stefano Soatto -
2011 Poster: Multiple Instance Filtering »
Kamil A Wnuk · Stefano Soatto