Timezone: »
Free-form text prompts allow users to describe their intentions during image manipulation conveniently. Based on the visual latent space of StyleGAN[21] and text embedding space of CLIP[34], studies focus on how to map these two latent spaces for text-driven attribute manipulations. Currently, the latent mapping between these two spaces is empirically designed and confines that each manipulation model can only handle one fixed text prompt. In this paper, we propose a method named Free-Form CLIP (FFCLIP), aiming to establish an automatic latent mapping so that one manipulation model handles free-form text prompts. Our FFCLIP has a cross-modality semantic modulation module containing semantic alignment and injection. The semantic alignment performs the automatic latent mapping via linear transformations with a cross attention mechanism. After alignment, we inject semantics from text prompt embeddings to the StyleGAN latent space. For one type of image (e.g., human portrait'), one FFCLIP model can be learned to handle free-form text prompts. Meanwhile, we observe that although each training text prompt only contains a single semantic meaning, FFCLIP can leverage text prompts with multiple semantic meanings for image manipulation. In the experiments, we evaluate FFCLIP on three types of images (i.e.,
human portraits', cars', and
churches'). Both visual and numerical results show that FFCLIP effectively produces semantically accurate and visually realistic images. Project page: https://github.com/KumapowerLIU/FFCLIP.
Author Information
Yiming Zhu (Tsinghua University, Tsinghua University)
Hongyu Liu (HKUST)
Yibing Song (Tencent AI Lab)
Ziyang Yuan (Huazhong University of Science and Technology)
Xintong Han (Huya Inc)
Chun Yuan (Tsinghua University)
Qifeng Chen (Hong Kong University of Science and Technology)
Jue Wang (Tencent AI Lab)
Related Events (a corresponding poster, oral, or spotlight)
-
2022 Poster: One Model to Edit Them All: Free-Form Text-Driven Image Manipulation with Semantic Modulations »
Thu. Dec 1st through Fri the 2nd Room Hall J #209
More from the Same Authors
-
2022 : On the Sparsity of Image Super-resolution Network »
Chenyu Dong · Hailong Ma · Jinjin Gu · Ruofan Zhang · Jieming Li · Chun Yuan -
2023 Poster: Multi-Prompt Alignment for Multi-Source Unsupervised Domain Adaptation »
Haoran Chen · Xintong Han · Zuxuan Wu · Yu-Gang Jiang -
2023 Poster: MeGraph: Capturing Long-Range Interactions by Alternating Local and Hierarchical Aggregation on Multi-Scaled Graph Hierarchy »
Honghua Dong · Jiawei Xu · Yu Yang · Rui Zhao · Shiwen Wu · Chun Yuan · Xiu Li · Chris Maddison · Lei Han -
2022 Spotlight: Stability Analysis and Generalization Bounds of Adversarial Training »
Jiancong Xiao · Yanbo Fan · Ruoyu Sun · Jue Wang · Zhi-Quan Luo -
2022 Spotlight: Lightning Talks 6B-1 »
Yushun Zhang · Duc Nguyen · Jiancong Xiao · Wei Jiang · Yaohua Wang · Yilun Xu · Zhen LI · Anderson Ye Zhang · Ziming Liu · Fangyi Zhang · Gilles Stoltz · Congliang Chen · Gang Li · Yanbo Fan · Ruoyu Sun · Naichen Shi · Yibo Wang · Ming Lin · Max Tegmark · Lijun Zhang · Jue Wang · Ruoyu Sun · Tommi Jaakkola · Senzhang Wang · Zhi-Quan Luo · Xiuyu Sun · Zhi-Quan Luo · Tianbao Yang · Rong Jin -
2022 Spotlight: Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator »
Zifan Shi · Yinghao Xu · Yujun Shen · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Spotlight: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training »
Zhan Tong · Yibing Song · Jue Wang · Limin Wang -
2022 Spotlight: Lightning Talks 5B-1 »
Devansh Arpit · Xiaojun Xu · Zifan Shi · Ivan Skorokhodov · Shayan Shekarforoush · Zhan Tong · Yiqun Wang · Shichong Peng · Linyi Li · Ivan Skorokhodov · Huan Wang · Yibing Song · David Lindell · Yinghao Xu · Seyed Alireza Moazenipourasil · Sergey Tulyakov · Peter Wonka · Yiqun Wang · Ke Li · David Fleet · Yujun Shen · Yingbo Zhou · Bo Li · Jue Wang · Peter Wonka · Marcus Brubaker · Caiming Xiong · Limin Wang · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Spotlight: Lightning Talks 4B-3 »
Zicheng Zhang · Mancheng Meng · Antoine Guedon · Yue Wu · Wei Mao · Zaiyu Huang · Peihao Chen · Shizhe Chen · Yongwei Chen · Keqiang Sun · Yi Zhu · chen rui · Hanhui Li · Dongyu Ji · Ziyan Wu · miaomiao Liu · Pascal Monasse · Yu Deng · Shangzhe Wu · Pierre-Louis Guhur · Jiaolong Yang · Kunyang Lin · Makarand Tapaswi · Zhaoyang Huang · Terrence Chen · Jiabao Lei · Jianzhuang Liu · Vincent Lepetit · Zhenyu Xie · Richard I Hartley · Dinggang Shen · Xiaodan Liang · Runhao Zeng · Cordelia Schmid · Michael Kampffmeyer · Mathieu Salzmann · Ning Zhang · Fangyun Wei · Yabin Zhang · Fan Yang · Qifeng Chen · Wei Ke · Quan Wang · Thomas Li · qingling Cai · Kui Jia · Ivan Laptev · Mingkui Tan · Xin Tong · Hongsheng Li · Xiaodan Liang · Chuang Gan -
2022 Spotlight: AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars »
Yue Wu · Yu Deng · Jiaolong Yang · Fangyun Wei · Qifeng Chen · Xin Tong -
2022 Poster: AniFaceGAN: Animatable 3D-Aware Face Image Generation for Video Avatars »
Yue Wu · Yu Deng · Jiaolong Yang · Fangyun Wei · Qifeng Chen · Xin Tong -
2022 Poster: Boosting the Transferability of Adversarial Attacks with Reverse Adversarial Perturbation »
Zeyu Qin · Yanbo Fan · Yi Liu · Li Shen · Yong Zhang · Jue Wang · Baoyuan Wu -
2022 Poster: Improving 3D-aware Image Synthesis with A Geometry-aware Discriminator »
Zifan Shi · Yinghao Xu · Yujun Shen · Deli Zhao · Qifeng Chen · Dit-Yan Yeung -
2022 Poster: OST: Improving Generalization of DeepFake Detection via One-Shot Test-Time Training »
Liang Chen · Yong Zhang · Yibing Song · Jue Wang · Lingqiao Liu -
2022 Poster: Stability Analysis and Generalization Bounds of Adversarial Training »
Jiancong Xiao · Yanbo Fan · Ruoyu Sun · Jue Wang · Zhi-Quan Luo -
2022 Poster: AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition »
Shoufa Chen · Chongjian GE · Zhan Tong · Jiangliu Wang · Yibing Song · Jue Wang · Ping Luo -
2022 Poster: Planning for Sample Efficient Imitation Learning »
Zhao-Heng Yin · Weirui Ye · Qifeng Chen · Yang Gao -
2022 Poster: VideoMAE: Masked Autoencoders are Data-Efficient Learners for Self-Supervised Video Pre-Training »
Zhan Tong · Yibing Song · Jue Wang · Limin Wang -
2021 Poster: Towards Calibrated Model for Long-Tailed Visual Recognition from Prior Perspective »
Zhengzhuo Xu · Zenghao Chai · Chun Yuan -
2021 Poster: Revitalizing CNN Attention via Transformers in Self-Supervised Visual Representation Learning »
Chongjian GE · Youwei Liang · YIBING SONG · Jianbo Jiao · Jue Wang · Ping Luo -
2021 Poster: Action-guided 3D Human Motion Prediction »
Jiangxin Sun · Zihang Lin · Xintong Han · Jian-Fang Hu · Jia Xu · Wei-Shi Zheng -
2017 : Competition II: Learning to Run »
Łukasz Kidziński · Carmichael Ong · Sharada Mohanty · Jason Fries · Jennifer Hicks · Zhuobin Zheng · Chun Yuan · Sergey Plis