Timezone: »
In diffusion models, UNet is the most popular network backbone, since its long skip connects (LSCs) to connect distant network blocks can aggregate long-distant information and alleviate vanishing gradient. Unfortunately, UNet often suffers from unstable training in diffusion models which can be alleviated by scaling its LSC coefficients smaller. However, theoretical understandings of the instability of UNet in diffusion models and also the performance improvement of LSC scaling remain absent yet. To solve this issue, we theoretically show that the coefficients of LSCs in UNet have big effects on the stableness of the forward and backward propagation and robustness of UNet. Specifically, the hidden feature and gradient of UNet at any layer can oscillate and their oscillation ranges are actually large which explains the instability of UNet training. Moreover, UNet is also provably sensitive to perturbed input, and predicts an output distant from the desired output, yielding oscillatory loss and thus oscillatory gradient. Besides, we also observe the theoretical benefits of the LSC coefficient scaling of UNet in the stableness of hidden features and gradient and also robustness. Finally, inspired by our theory, we propose an effective coefficient scaling framework ScaleLong that scales the coefficients of LSC in UNet and better improve the training stability of UNet. Experimental results on CIFAR10, CelebA, ImageNet and COCO show that our methods are superior to stabilize training, and yield about 1.5x training acceleration on different diffusion models with UNet or UViT backbones.
Author Information
Zhongzhan Huang (Sun Yat-Sen University)
Pan Zhou (SEA AI Lab)
Currently, I am a senior Research Scientist in Sea AI Lab of Sea group. Before, I worked in Salesforce as a research scientist during 2019 to 2021. I completed my Ph.D. degree in 2019 at the National University of Singapore (NUS), fortunately advised by Prof. Jiashi Feng and Prof. Shuicheng Yan. Before studying in NUS, I graduated from Peking University (PKU) in 2016 and during this period, I was fortunately directed by Prof. Zhouchen Lin and Prof. Chao Zhang in ZERO Lab. During the research period, I also work closely with Prof. Xiaotong Yuan. I also spend several wonderful months in 2018 at Georgia Tech as visiting student hosted by Prof. Huan Xu.
Shuicheng Yan (National University of Singapore, Department of Electrical and Computer Engineering)
Liang Lin (Sun Yat-Sen University)
More from the Same Authors
-
2020 : Task Similarity Aware Meta Learning: Theory-inspired Improvement on MAML »
Pan Zhou -
2021 Spotlight: A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning »
Pan Zhou · Caiming Xiong · Xiaotong Yuan · Steven Chu Hong Hoi -
2021 : Geometric Question Answering Towards Multimodal Numerical Reasoning »
Jiaqi Chen · Jianheng Tang · Jinghui Qin · Xiaodan Liang · Lingbo Liu · Eric Xing · Liang Lin -
2022 Poster: Inception Transformer »
Chenyang Si · Weihao Yu · Pan Zhou · Yichen Zhou · Xinchao Wang · Shuicheng Yan -
2022 : Adan: Adaptive Nesterov Momentum Algorithm for Faster Optimizing Deep Models »
Xingyu Xie · Pan Zhou · Huan Li · Zhouchen Lin · Shuicheng Yan -
2022 : Win: Weight-Decay-Integrated Nesterov Acceleration for Adaptive Gradient Algorithms »
Pan Zhou · Xingyu Xie · Shuicheng Yan -
2022 : DIMENSION-REDUCED ADAPTIVE GRADIENT METHOD »
Jingyang Li · Pan Zhou · Kuangyu Ding · Kim-Chuan Toh · Yinyu Ye -
2023 Poster: On Calibrating Diffusion Probabilistic Models »
Tianyu Pang · Cheng Lu · Chao Du · Min Lin · Shuicheng Yan · Zhijie Deng -
2023 Poster: Mutual Information Regularized Offline Reinforcement Learning »
Xiao Ma · Bingyi Kang · Zhongwen Xu · Min Lin · Shuicheng Yan -
2023 Poster: Efficient Diffusion Policies For Offline Reinforcement Learning »
Bingyi Kang · Xiao Ma · Chao Du · Tianyu Pang · Shuicheng Yan -
2023 Poster: Gaussian Mixture Solvers for Diffusion Models »
Hanzhong Guo · Cheng Lu · Fan Bao · Tianyu Pang · Shuicheng Yan · Chao Du · Chongxuan LI -
2022 Spotlight: Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning »
Ziyi Zhang · Weikai Chen · Hui Cheng · Zhen Li · Siyuan Li · Liang Lin · Guanbin Li -
2022 Spotlight: Inception Transformer »
Chenyang Si · Weihao Yu · Pan Zhou · Yichen Zhou · Xinchao Wang · Shuicheng Yan -
2022 Spotlight: Lightning Talks 2B-1 »
Yehui Tang · Jian Wang · Zheng Chen · man zhou · Peng Gao · Chenyang Si · SHANGKUN SUN · Yixing Xu · Weihao Yu · Xinghao Chen · Kai Han · Hu Yu · Yulun Zhang · Chenhui Gou · Teli Ma · Yuanqi Chen · Yunhe Wang · Hongsheng Li · Jinjin Gu · Jianyuan Guo · Qiman Wu · Pan Zhou · Yu Zhu · Jie Huang · Chang Xu · Yichen Zhou · Haocheng Feng · Guodong Guo · yongbing zhang · Ziyi Lin · Feng Zhao · Ge Li · Junyu Han · Jinwei Gu · Jifeng Dai · Chao Xu · Xinchao Wang · Linghe Kong · Shuicheng Yan · Yu Qiao · Chen Change Loy · Xin Yuan · Errui Ding · Yunhe Wang · Deyu Meng · Jingdong Wang · Chongyi Li -
2022 Poster: Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning »
Ziyi Zhang · Weikai Chen · Hui Cheng · Zhen Li · Siyuan Li · Liang Lin · Guanbin Li -
2022 Poster: Structure-Preserving 3D Garment Modeling with Neural Sewing Machines »
Xipeng Chen · Guangrun Wang · Dizhong Zhu · Xiaodan Liang · Philip Torr · Liang Lin -
2021 Poster: Rethinking the Pruning Criteria for Convolutional Neural Network »
Zhongzhan Huang · Wenqi Shao · Xinjiang Wang · Liang Lin · Ping Luo -
2021 Poster: Towards Understanding Why Lookahead Generalizes Better Than SGD and Beyond »
Pan Zhou · Hanshu Yan · Xiaotong Yuan · Jiashi Feng · Shuicheng Yan -
2021 Poster: A Theory-Driven Self-Labeling Refinement Method for Contrastive Representation Learning »
Pan Zhou · Caiming Xiong · Xiaotong Yuan · Steven Chu Hong Hoi -
2020 Poster: Towards Theoretically Understanding Why Sgd Generalizes Better Than Adam in Deep Learning »
Pan Zhou · Jiashi Feng · Chao Ma · Caiming Xiong · Steven Chu Hong Hoi · Weinan E -
2020 Poster: Theory-Inspired Path-Regularized Differential Network Architecture Search »
Pan Zhou · Caiming Xiong · Richard Socher · Steven Chu Hong Hoi -
2020 Oral: Theory-Inspired Path-Regularized Differential Network Architecture Search »
Pan Zhou · Caiming Xiong · Richard Socher · Steven Chu Hong Hoi -
2020 Poster: Improving GAN Training with Probability Ratio Clipping and Sample Reweighting »
Yue Wu · Pan Zhou · Andrew Wilson · Eric Xing · Zhiting Hu -
2020 Poster: Auto-Panoptic: Cooperative Multi-Component Architecture Search for Panoptic Segmentation »
Yangxin Wu · Gengwei Zhang · Hang Xu · Xiaodan Liang · Liang Lin -
2019 Poster: Efficient Meta Learning via Minibatch Proximal Update »
Pan Zhou · Xiaotong Yuan · Huan Xu · Shuicheng Yan · Jiashi Feng -
2019 Spotlight: Efficient Meta Learning via Minibatch Proximal Update »
Pan Zhou · Xiaotong Yuan · Huan Xu · Shuicheng Yan · Jiashi Feng -
2018 Poster: New Insight into Hybrid Stochastic Gradient Descent: Beyond With-Replacement Sampling and Convexity »
Pan Zhou · Xiaotong Yuan · Jiashi Feng -
2018 Poster: Symbolic Graph Reasoning Meets Convolutions »
Xiaodan Liang · Zhiting Hu · Hao Zhang · Liang Lin · Eric Xing -
2018 Poster: Efficient Stochastic Gradient Hard Thresholding »
Pan Zhou · Xiaotong Yuan · Jiashi Feng -
2018 Poster: Hybrid Knowledge Routed Modules for Large-scale Object Detection »
ChenHan Jiang · Hang Xu · Xiaodan Liang · Liang Lin -
2018 Poster: Kalman Normalization: Normalizing Internal Representations Across Network Layers »
Guangrun Wang · jiefeng peng · Ping Luo · Xinjiang Wang · Liang Lin -
2014 Poster: Deep Joint Task Learning for Generic Object Extraction »
Xiaolong Wang · Liliang Zhang · Liang Lin · Zhujin Liang · Wangmeng Zuo