NeurIPS Poster I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Poster

I2EBench: A Comprehensive Benchmark for Instruction-based Image Editing

Yiwei Ma · Jiayi Ji · Ke Ye · Weihuang Lin · Zhibin Wang · Yonghan Zheng · Qiang Zhou · Xiaoshuai Sun · Rongrong Ji

West Ballroom A-D #6910

[ Abstract ]

[ Paper] [ Poster] [ OpenReview]

Fri 13 Dec 11 a.m. PST — 2 p.m. PST

Abstract:

Significant progress has been made in the field of Instruction-based Image Editing (IIE). However, evaluating these models poses a significant challenge. A crucial requirement in this field is the establishment of a comprehensive evaluation benchmark for accurately assessing editing results and providing valuable insights for its further development. In response to this need, we propose I2EBench, a comprehensive benchmark designed to automatically evaluate the quality of edited images produced by IIE models from multiple dimensions. I2EBench consists of 2,000+ images for editing, along with 4,000+ corresponding original and diverse instructions. It offers three distinctive characteristics: 1) Comprehensive Evaluation Dimensions: I2EBench comprises 16 evaluation dimensions that cover both high-level and low-level aspects, providing a comprehensive assessment of each IIE model. 2) Human Perception Alignment: To ensure the alignment of our benchmark with human perception, we conducted an extensive user study for each evaluation dimension. 3) Valuable Research Insights: By analyzing the advantages and disadvantages of existing IIE models across the 16 dimensions, we offer valuable research insights to guide future development in the field. We will open-source I2EBench, including all instructions, input images, human annotations, edited images from all evaluated methods, and a simple script for evaluating the results from new IIE models. The code, dataset, and generated images from all IIE models are provided in GitHub: https://github.com/cocoshe/I2EBench.

Chat is not available.