Timezone: »
Fine-tuning pre-trained deep networks on a small dataset is an important component in the deep learning pipeline. A critical problem in fine-tuning is how to avoid over-fitting when data are limited. Existing efforts work from two aspects: (1) impose regularization on parameters or features; (2) transfer prior knowledge to fine-tuning by reusing pre-trained parameters. In this paper, we take an alternative approach by refactoring the widely used Batch Normalization (BN) module to mitigate over-fitting. We propose a two-branch design with one branch normalized by mini-batch statistics and the other branch normalized by moving statistics. During training, two branches are stochastically selected to avoid over-depending on some sample statistics, resulting in a strong regularization effect, which we interpret as ``architecture regularization.'' The resulting method is dubbed stochastic normalization (\textbf{StochNorm}). With the two-branch architecture, it naturally incorporates pre-trained moving statistics in BN layers during fine-tuning, exploiting more prior knowledge of pre-trained networks. Extensive empirical experiments show that StochNorm is a powerful tool to avoid over-fitting in fine-tuning with small datasets. Besides, StochNorm is readily pluggable in modern CNN backbones. It is complementary to other fine-tuning methods and can work together to achieve stronger regularization effect.
Author Information
Zhi Kou (Tsinghua University)
Kaichao You (Tsinghua University)
Mingsheng Long (Tsinghua University)
Jianmin Wang (Tsinghua University)
More from the Same Authors
-
2022 Poster: Hub-Pathway: Transfer Learning from A Hub of Pre-trained Models »
Yang Shu · Zhangjie Cao · Ziyang Zhang · Jianmin Wang · Mingsheng Long -
2022 Poster: Supported Policy Optimization for Offline Reinforcement Learning »
Jialong Wu · Haixu Wu · Zihan Qiu · Jianmin Wang · Mingsheng Long -
2022 Poster: Non-stationary Transformers: Exploring the Stationarity in Time Series Forecasting »
Yong Liu · Haixu Wu · Jianmin Wang · Mingsheng Long -
2022 : Domain Adaptation: Theory, Algorithms, and Open Library »
Mingsheng Long -
2022 Poster: Debiased Self-Training for Semi-Supervised Learning »
Baixu Chen · Junguang Jiang · Ximei Wang · Pengfei Wan · Jianmin Wang · Mingsheng Long -
2021 Poster: Cycle Self-Training for Domain Adaptation »
Hong Liu · Jianmin Wang · Mingsheng Long -
2021 Poster: Autoformer: Decomposition Transformers with Auto-Correlation for Long-Term Series Forecasting »
Haixu Wu · Jiehui Xu · Jianmin Wang · Mingsheng Long -
2020 Poster: Co-Tuning for Transfer Learning »
Kaichao You · Zhi Kou · Mingsheng Long · Jianmin Wang -
2020 Poster: Transferable Calibration with Lower Bias and Variance in Domain Adaptation »
Ximei Wang · Mingsheng Long · Jianmin Wang · Michael Jordan -
2020 Poster: Learning to Adapt to Evolving Domains »
Hong Liu · Mingsheng Long · Jianmin Wang · Yu Wang -
2019 Poster: Catastrophic Forgetting Meets Negative Transfer: Batch Spectral Shrinkage for Safe Transfer Learning »
Xinyang Chen · Sinan Wang · Bo Fu · Mingsheng Long · Jianmin Wang -
2019 Poster: Transferable Normalization: Towards Improving Transferability of Deep Neural Networks »
Ximei Wang · Ying Jin · Mingsheng Long · Jianmin Wang · Michael Jordan -
2018 Poster: Conditional Adversarial Domain Adaptation »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Michael Jordan -
2018 Poster: Generalized Zero-Shot Learning with Deep Calibration Network »
Shichen Liu · Mingsheng Long · Jianmin Wang · Michael Jordan -
2017 Poster: PredRNN: Recurrent Neural Networks for Predictive Learning using Spatiotemporal LSTMs »
Yunbo Wang · Mingsheng Long · Jianmin Wang · Zhifeng Gao · Philip S Yu -
2017 Poster: Learning Multiple Tasks with Multilinear Relationship Networks »
Mingsheng Long · ZHANGJIE CAO · Jianmin Wang · Philip S Yu -
2016 Poster: Unsupervised Domain Adaptation with Residual Transfer Networks »
Mingsheng Long · Han Zhu · Jianmin Wang · Michael Jordan -
2015 Workshop: Transfer and Multi-Task Learning: Trends and New Perspectives »
Anastasia Pentina · Christoph Lampert · Sinno Jialin Pan · Mingsheng Long · Judy Hoffman · Baochen Sun · Kate Saenko