Timezone: »
We present a new perspective of achieving image synthesis by viewing this task as a visual token generation problem. Different from existing paradigms that directly synthesize a full image from a single input (e.g., a latent code), the new formulation enables a flexible local manipulation for different image regions, which makes it possible to learn content-aware and fine-grained style control for image synthesis. Specifically, it takes as input a sequence of latent tokens to predict the visual tokens for synthesizing an image. Under this perspective, we propose a token-based generator (i.e., TokenGAN). Particularly, the TokenGAN inputs two semantically different visual tokens, i.e., the learned constant content tokens and the style tokens from the latent space. Given a sequence of style tokens, the TokenGAN is able to control the image synthesis by assigning the styles to the content tokens by attention mechanism with a Transformer. We conduct extensive experiments and show that the proposed TokenGAN has achieved state-of-the-art results on several widely-used image synthesis benchmarks, including FFHQ and LSUN CHURCH with different resolutions. In particular, the generator is able to synthesize high-fidelity images with (1024x1024) size, dispensing with convolutions entirely.
Author Information
Yanhong Zeng (Sun Yat-sen University)
Huan Yang (Microsoft Research)
Hongyang Chao
Jianbo Wang (The University of Tokyo, Tokyo Institute of Technology)
Jianlong Fu (Microsoft Research)
More from the Same Authors
-
2022 Poster: Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning »
Yuchong Sun · Hongwei Xue · Ruihua Song · Bei Liu · Huan Yang · Jianlong Fu -
2021 Poster: Searching the Search Space of Vision Transformer »
Minghao Chen · Kan Wu · Bolin Ni · Houwen Peng · Bei Liu · Jianlong Fu · Hongyang Chao · Haibin Ling -
2021 Poster: Probing Inter-modality: Visual Parsing with Self-Attention for Vision-and-Language Pre-training »
Hongwei Xue · Yupan Huang · Bei Liu · Houwen Peng · Jianlong Fu · Houqiang Li · Jiebo Luo -
2020 Poster: Cream of the Crop: Distilling Prioritized Paths For One-Shot Neural Architecture Search »
Houwen Peng · Hao Du · Hongyuan Yu · QI LI · Jing Liao · Jianlong Fu -
2020 Poster: Learning Semantic-aware Normalization for Generative Adversarial Networks »
Heliang Zheng · Jianlong Fu · Yanhong Zeng · Jiebo Luo · Zheng-Jun Zha -
2020 Spotlight: Learning Semantic-aware Normalization for Generative Adversarial Networks »
Heliang Zheng · Jianlong Fu · Yanhong Zeng · Jiebo Luo · Zheng-Jun Zha -
2019 Poster: Learning Deep Bilinear Transformation for Fine-grained Image Representation »
Heliang Zheng · Jianlong Fu · Zheng-Jun Zha · Jiebo Luo