TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/NoHumansRequired: Autonomous High-Quality Image Editing Tr...

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining

Maksim Kuprashevich, Grigorii Alekseenko, Irina Tolstykh, Georgii Fedorov, Bulat Suleimanov, Vladimir Dokholyan, Aleksandr Gordeev

2025-07-18Image EditingText-based Image Editing
PaperPDF

Abstract

Recent advances in generative modeling enable image editing assistants that follow natural language instructions without additional user input. Their supervised training requires millions of triplets: original image, instruction, edited image. Yet mining pixel-accurate examples is hard. Each edit must affect only prompt-specified regions, preserve stylistic coherence, respect physical plausibility, and retain visual appeal. The lack of robust automated edit-quality metrics hinders reliable automation at scale. We present an automated, modular pipeline that mines high-fidelity triplets across domains, resolutions, instruction complexities, and styles. Built on public generative models and running without human intervention, our system uses a task-tuned Gemini validator to score instruction adherence and aesthetics directly, removing any need for segmentation or grounding models. Inversion and compositional bootstrapping enlarge the mined set by approximately 2.2x, enabling large-scale high-fidelity training data. By automating the most repetitive annotation steps, the approach allows a new scale of training without human labeling effort. To democratize research in this resource-intensive area, we release NHR-Edit: an open dataset of 358k high-quality triplets. In the largest cross-dataset evaluation, it surpasses all public alternatives. We also release Bagel-NHR-Edit, an open-source fine-tuned Bagel model, which achieves state-of-the-art metrics in our experiments.

Results

TaskDatasetMetricValueModel
Image EditingImgEdit-DataAction3.95BAGEL-NHR-EDIT
Image EditingImgEdit-DataAdd4.19BAGEL-NHR-EDIT
Image EditingImgEdit-DataAdjust3.55BAGEL-NHR-EDIT
Image EditingImgEdit-DataBackground3.42BAGEL-NHR-EDIT
Image EditingImgEdit-DataExtract1.62BAGEL-NHR-EDIT
Image EditingImgEdit-DataHybrid2.94BAGEL-NHR-EDIT
Image EditingImgEdit-DataOverall3.39BAGEL-NHR-EDIT
Image EditingImgEdit-DataRemove3.18BAGEL-NHR-EDIT
Image EditingImgEdit-DataReplace3.77BAGEL-NHR-EDIT
Image EditingImgEdit-DataStyle4.3BAGEL-NHR-EDIT
Image EditingGEdit-Bench-ENOverall7.12BAGEL-NHR-EDIT
Image EditingGEdit-Bench-ENPerceptual Quality6.88BAGEL-NHR-EDIT
Image EditingGEdit-Bench-ENSemantic Consistency8.07BAGEL-NHR-EDIT

Related Papers

UniWorld-V1: High-Resolution Semantic Encoders for Unified Visual Understanding and Generation2025-06-03Cora: Correspondence-aware image editing using few step diffusion2025-05-29ImgEdit: A Unified Image Editing Dataset and Benchmark2025-05-26Emerging Properties in Unified Multimodal Pretraining2025-05-20Step1X-Edit: A Practical Framework for General Image Editing2025-04-24POEM: Precise Object-level Editing via MLLM control2025-04-10ILLUME+: Illuminating Unified MLLM with Dual Visual Tokenization and Diffusion Refinement2025-04-02KV-Edit: Training-Free Image Editing for Precise Background Preservation2025-02-24