TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Step1X-Edit: A Practical Framework for General Image Editing

Step1X-Edit: A Practical Framework for General Image Editing

Shiyu Liu, Yucheng Han, Peng Xing, Fukun Yin, Rui Wang, Wei Cheng, Jiaqi Liao, Yingming Wang, Honghao Fu, Chunrui Han, Guopeng Li, Yuang Peng, Quan Sun, Jingwei Wu, Yan Cai, Zheng Ge, Ranchen Ming, Lei Xia, Xianfang Zeng, Yibo Zhu, Binxing Jiao, Xiangyu Zhang, Gang Yu, Daxin Jiang

2025-04-24Image EditingImage Manipulation
PaperPDFCode(official)

Abstract

In recent years, image editing models have witnessed remarkable and rapid development. The recent unveiling of cutting-edge multimodal models such as GPT-4o and Gemini2 Flash has introduced highly promising image editing capabilities. These models demonstrate an impressive aptitude for fulfilling a vast majority of user-driven editing requirements, marking a significant advancement in the field of image manipulation. However, there is still a large gap between the open-source algorithm with these closed-source models. Thus, in this paper, we aim to release a state-of-the-art image editing model, called Step1X-Edit, which can provide comparable performance against the closed-source models like GPT-4o and Gemini2 Flash. More specifically, we adopt the Multimodal LLM to process the reference image and the user's editing instruction. A latent embedding has been extracted and integrated with a diffusion image decoder to obtain the target image. To train the model, we build a data generation pipeline to produce a high-quality dataset. For evaluation, we develop the GEdit-Bench, a novel benchmark rooted in real-world user instructions. Experimental results on GEdit-Bench demonstrate that Step1X-Edit outperforms existing open-source baselines by a substantial margin and approaches the performance of leading proprietary models, thereby making significant contributions to the field of image editing.

Results

TaskDatasetMetricValueModel
Image EditingImgEdit-DataAction2.52Step1X-Edit
Image EditingImgEdit-DataAdd3.88Step1X-Edit
Image EditingImgEdit-DataAdjust3.14Step1X-Edit
Image EditingImgEdit-DataBackground3.16Step1X-Edit
Image EditingImgEdit-DataExtract1.76Step1X-Edit
Image EditingImgEdit-DataHybrid2.64Step1X-Edit
Image EditingImgEdit-DataOverall3.06Step1X-Edit
Image EditingImgEdit-DataRemove2.41Step1X-Edit
Image EditingImgEdit-DataReplace3.4Step1X-Edit
Image EditingImgEdit-DataStyle4.63Step1X-Edit
Image EditingGEdit-Bench-ENOverall6.7Step1X-Edit
Image EditingGEdit-Bench-ENPerceptual Quality6.76Step1X-Edit
Image EditingGEdit-Bench-ENSemantic Consistency7.09Step1X-Edit

Related Papers

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization2025-07-17Towards Reliable Identification of Diffusion-based Image Manipulations2025-06-05UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation2025-06-03Weakly-supervised Localization of Manipulated Image Regions Using Multi-resolution Learned Features2025-05-29ImgEdit: A Unified Image Editing Dataset and Benchmark2025-05-26RBench-V: A Primary Assessment for Visual Reasoning Models with Multi-modal Outputs2025-05-22My Face Is Mine, Not Yours: Facial Protection Against Diffusion Model Face Swapping2025-05-21