Timezone: »

 
Poster
T2I-CompBench: A Comprehensive Benchmark for Open-world Compositional Text-to-image Generation
Kaiyi Huang · Kaiyue Sun · Enze Xie · Zhenguo Li · Xihui Liu

Tue Dec 12 08:45 AM -- 10:45 AM (PST) @ Great Hall & Hall B1+B2 #225

Despite the stunning ability to generate high-quality images by recent text-to-image models, current approaches often struggle to effectively compose objects with different attributes and relationships into a complex and coherent scene. We propose T2I-CompBench, a comprehensive benchmark for open-world compositional text-to-image generation, consisting of 6,000 compositional text prompts from 3 categories (attribute binding, object relationships, and complex compositions) and 6 sub-categories (color binding, shape binding, texture binding, spatial relationships, non-spatial relationships, and complex compositions). We further propose several evaluation metrics specifically designed to evaluate compositional text-to-image generation and explore the potential and limitations of multimodal LLMs for evaluation. We introduce a new approach, Generative mOdel finetuning with Reward-driven Sample selection (GORS), to boost the compositional text-to-image generation abilities of pretrained text-to-image models. Extensive experiments and evaluations are conducted to benchmark previous methods on T2I-CompBench, and to validate the effectiveness of our proposed evaluation metrics and GORS approach. Project page is available at https://karine-h.github.io/T2I-CompBench/.

Author Information

Kaiyi Huang (THE UNIVERSITY OF HONG KONG)
Kaiyue Sun (University of Hong Kong)
Enze Xie (The University of Hong Kong)

I am a PhD student in Department of Computer Science, The University of Hong Kong (HKU) since 2019, supervised by Prof. Ping Luo and co-supervised by Prof. Wenping Wang. I obtained B.S. from Nanjing University of Aeronautics and Astronautics (2016) and M.S. from TongJi University (2019). From 2018 to present, I collaborated with several researchers in industry e.g. Face++(Megvii), SenseTime, Facebook, Huawei and NVIDIA. My research interest is computer vision in 2D and 3D. I did some works about instance-level detection and self/semi/weak-supervised learning. I developed a few well-known computer vision algorithms including PolarMask, which was selected as CVPR 2020 Top-10 Influential Papers. I co-developed OpenSelfSup(1k+ star), a popular self-supervised learning framework. I am finding a full-time research job. Please contact me!

Zhenguo Li (Noah's Ark Lab, Huawei Tech Investment Co Ltd)
Xihui Liu (The University of Hong Kong)

More from the Same Authors