TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Null-text Inversion for Editing Real Images using Guided D...

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or

2022-11-17CVPR 2023 1Image GenerationText-based Image Editing
PaperPDFCode(official)CodeCodeCode

Abstract

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.

Results

TaskDatasetMetricValueModel
Image GenerationPIE-BenchBackground LPIPS60.67Null-Text Inversion+Prompt-to-Prompt
Image GenerationPIE-BenchBackground PSNR27.03Null-Text Inversion+Prompt-to-Prompt
Image GenerationPIE-BenchCLIPSIM24.75Null-Text Inversion+Prompt-to-Prompt
Image GenerationPIE-BenchStructure Distance13.44Null-Text Inversion+Prompt-to-Prompt
Text-to-Image GenerationPIE-BenchBackground LPIPS60.67Null-Text Inversion+Prompt-to-Prompt
Text-to-Image GenerationPIE-BenchBackground PSNR27.03Null-Text Inversion+Prompt-to-Prompt
Text-to-Image GenerationPIE-BenchCLIPSIM24.75Null-Text Inversion+Prompt-to-Prompt
Text-to-Image GenerationPIE-BenchStructure Distance13.44Null-Text Inversion+Prompt-to-Prompt
10-shot image generationPIE-BenchBackground LPIPS60.67Null-Text Inversion+Prompt-to-Prompt
10-shot image generationPIE-BenchBackground PSNR27.03Null-Text Inversion+Prompt-to-Prompt
10-shot image generationPIE-BenchCLIPSIM24.75Null-Text Inversion+Prompt-to-Prompt
10-shot image generationPIE-BenchStructure Distance13.44Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 StitchiPIE-BenchBackground LPIPS60.67Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 StitchiPIE-BenchBackground PSNR27.03Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 StitchiPIE-BenchCLIPSIM24.75Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 StitchiPIE-BenchStructure Distance13.44Null-Text Inversion+Prompt-to-Prompt

Related Papers

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15