Haozhe Zhao, Xiaojian Ma, Liang Chen, Shuzheng Si, Rujie Wu, Kaikai An, Peiyu Yu, Minjia Zhang, Qing Li, Baobao Chang
This paper presents UltraEdit, a large-scale (approximately 4 million editing samples), automatically generated dataset for instruction-based image editing. Our key idea is to address the drawbacks in existing image editing datasets like InstructPix2Pix and MagicBrush, and provide a systematic approach to producing massive and high-quality image editing samples. UltraEdit offers several distinct advantages: 1) It features a broader range of editing instructions by leveraging the creativity of large language models (LLMs) alongside in-context editing examples from human raters; 2) Its data sources are based on real images, including photographs and artworks, which provide greater diversity and reduced bias compared to datasets solely generated by text-to-image models; 3) It also supports region-based editing, enhanced by high-quality, automatically produced region annotations. Our experiments show that canonical diffusion-based editing baselines trained on UltraEdit set new records on MagicBrush and Emu-Edit benchmarks. Our analysis further confirms the crucial role of real image anchors and region-based editing data. The dataset, code, and models can be found in https://ultra-editing.github.io.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Editing | ImgEdit-Data | Action | 2.98 | UltraEdit |
| Image Editing | ImgEdit-Data | Add | 3.44 | UltraEdit |
| Image Editing | ImgEdit-Data | Adjust | 2.81 | UltraEdit |
| Image Editing | ImgEdit-Data | Background | 2.83 | UltraEdit |
| Image Editing | ImgEdit-Data | Extract | 2.13 | UltraEdit |
| Image Editing | ImgEdit-Data | Hybrid | 1.91 | UltraEdit |
| Image Editing | ImgEdit-Data | Overall | 2.7 | UltraEdit |
| Image Editing | ImgEdit-Data | Remove | 1.45 | UltraEdit |
| Image Editing | ImgEdit-Data | Replace | 2.96 | UltraEdit |
| Image Editing | ImgEdit-Data | Style | 3.76 | UltraEdit |