Timezone: »
Text-to-Image diffusion models have made tremendous progress over the past two years, enabling the generation of highly realistic images based on open-domain text descriptions. However, despite their success, text descriptions often struggle to adequately convey detailed controls, even when composed of long and complex texts. Moreover, recent studies have also shown that these models face challenges in understanding such complex texts and generating the corresponding images. Therefore, there is a growing need to enable more control modes beyond text description. In this paper, we introduce Uni-ControlNet, a unified framework that allows for the simultaneous utilization of different local controls (e.g., edge maps, depth map, segmentation masks) and global controls (e.g., CLIP image embeddings) in a flexible and composable manner within one single model. Unlike existing methods, Uni-ControlNet only requires the fine-tuning of two additional adapters upon frozen pre-trained text-to-image diffusion models, eliminating the huge cost of training from scratch. Moreover, thanks to some dedicated adapter designs, Uni-ControlNet only necessitates a constant number (i.e., 2) of adapters, regardless of the number of local or global controls used. This not only reduces the fine-tuning costs and model size, making it more suitable for real-world deployment, but also facilitate composability of different conditions. Through both quantitative and qualitative comparisons, Uni-ControlNet demonstrates its superiority over existing methods in terms of controllability, generation quality and composability. Code is available at https://github.com/ShihaoZhaoZSH/Uni-ControlNet.
Author Information
Shihao Zhao (The University of Hong Kong)

1. Shihao Zhao, Xingjun Ma, Xiang Zheng, James Bailey, Jingjing Chen, Yu-Gang Jiang. Clean-label backdoor attacks on video recognition models. CVPR, 2020. 2. Shihao Zhao, Xingjun Ma, Yisen Wang, James Bailey, Bo Li, Yu-Gang JiangWhat Do Deep Nets Learn? Class-wise Patterns Revealed in the Input Space. Arxiv, 2021. 3. Bojia Zi, Shihao Zhao, Xingjun Ma, Yu-Gang Jiang. Revisiting adversarial robustness distillation: Robust soft labels make student better. ICCV, 2021. 4. Shaozhe Hao, Kai Han, Shihao Zhao, Kwan-Yee K. Wong. ViCo: Detail-Preserving Visual Condition for Personalized Text-to-Image Generation. Arxiv, 2023. 5. Shihao Zhao, Dongdong Chen, Yen-Chun Chen, Jianmin Bao, Shaozhe Hao, Lu Yuan, Kwan-Yee K. Wong. Uni-ControlNet: All-in-One Control to Text-to-Image Diffusion Models. NeurIPS, 2023.
Dongdong Chen (Microsoft Cloud AI)
Yen-Chun Chen (Microsoft)
Jianmin Bao (Microsoft Research)
Shaozhe Hao (University of Hong Kong)
Lu Yuan (Microsoft)
Kwan-Yee K. Wong (The University of Hong Kong)
More from the Same Authors
-
2021 : VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation »
Linjie Li · Jie Lei · Zhe Gan · Licheng Yu · Yen-Chun Chen · Rohit Pillai · Yu Cheng · Luowei Zhou · Xin Wang · William Yang Wang · Tamara L Berg · Mohit Bansal · Jingjing Liu · Lijuan Wang · Zicheng Liu -
2021 Spotlight: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2022 Poster: REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering »
Yuanze Lin · Yujia Xie · Dongdong Chen · Yichong Xu · Chenguang Zhu · Lu Yuan -
2022 Poster: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2023 Poster: Learning from Rich Semantics and Coarse Locations for Long-tailed Object Detection »
Lingchen Meng · Xiyang Dai · Jianwei Yang · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Yi-Ling Chen · Zuxuan Wu · Lu Yuan · Yu-Gang Jiang -
2023 Poster: Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser »
Yung-Hsuan Lai · Yen-Chun Chen · Frank Wang -
2023 Poster: HeadSculpt: Crafting 3D Head Avatars with Text »
Xiao Han · Yukang Cao · Kai Han · Xiatian Zhu · Jiankang Deng · Yi-Zhe Song · Tao Xiang · Kwan-Yee K. Wong -
2022 Spotlight: OmniVL: One Foundation Model for Image-Language and Video-Language Tasks »
Junke Wang · Dongdong Chen · Zuxuan Wu · Chong Luo · Luowei Zhou · Yucheng Zhao · Yujia Xie · Ce Liu · Yu-Gang Jiang · Lu Yuan -
2022 Poster: K-LITE: Learning Transferable Visual Models with External Knowledge »
Sheng Shen · Chunyuan Li · Xiaowei Hu · Yujia Xie · Jianwei Yang · Pengchuan Zhang · Zhe Gan · Lijuan Wang · Lu Yuan · Ce Liu · Kurt Keutzer · Trevor Darrell · Anna Rohrbach · Jianfeng Gao -
2022 Poster: Visual Clues: Bridging Vision and Language Foundations for Image Paragraph Captioning »
Yujia Xie · Luowei Zhou · Xiyang Dai · Lu Yuan · Nguyen Bach · Ce Liu · Michael Zeng -
2022 Poster: S$^3$-NeRF: Neural Reflectance Field from Shading and Shadow under a Single Viewpoint »
Wenqi Yang · Guanying Chen · Chaofeng Chen · Zhenfang Chen · Kwan-Yee K. Wong -
2022 Poster: GLIPv2: Unifying Localization and Vision-Language Understanding »
Haotian Zhang · Pengchuan Zhang · Xiaowei Hu · Yen-Chun Chen · Liunian Li · Xiyang Dai · Lijuan Wang · Lu Yuan · Jenq-Neng Hwang · Jianfeng Gao -
2021 Poster: Stronger NAS with Weaker Predictors »
Junru Wu · Xiyang Dai · Dongdong Chen · Yinpeng Chen · Mengchen Liu · Ye Yu · Zhangyang Wang · Zicheng Liu · Mei Chen · Lu Yuan -
2021 Poster: Focal Attention for Long-Range Interactions in Vision Transformers »
Jianwei Yang · Chunyuan Li · Pengchuan Zhang · Xiyang Dai · Bin Xiao · Lu Yuan · Jianfeng Gao -
2021 Poster: Chasing Sparsity in Vision Transformers: An End-to-End Exploration »
Tianlong Chen · Yu Cheng · Zhe Gan · Lu Yuan · Lei Zhang · Zhangyang Wang -
2020 Poster: Passport-aware Normalization for Deep Model Protection »
Jie Zhang · Dongdong Chen · Jing Liao · Weiming Zhang · Gang Hua · Nenghai Yu -
2020 Poster: GreedyFool: Distortion-Aware Sparse Adversarial Attack »
Xiaoyi Dong · Dongdong Chen · Jianmin Bao · Chuan Qin · Lu Yuan · Weiming Zhang · Nenghai Yu · Dong Chen -
2020 Poster: Large-Scale Adversarial Training for Vision-and-Language Representation Learning »
Zhe Gan · Yen-Chun Chen · Linjie Li · Chen Zhu · Yu Cheng · Jingjing Liu -
2020 Spotlight: Large-Scale Adversarial Training for Vision-and-Language Representation Learning »
Zhe Gan · Yen-Chun Chen · Linjie Li · Chen Zhu · Yu Cheng · Jingjing Liu