Null-text Inversion for Editing Real Images using Guided Diffusion Models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, Daniel Cohen-Or

2022-11-17CVPR 2023 1Image Generation Text-based Image Editing

Abstract

Recent text-guided diffusion models provide powerful image generation capabilities. Currently, a massive effort is given to enable the modification of these images using text only as means to offer intuitive and versatile editing. To edit a real image using these state-of-the-art tools, one must first invert the image with a meaningful text prompt into the pretrained model's domain. In this paper, we introduce an accurate inversion technique and thus facilitate an intuitive text-based modification of the image. Our proposed inversion consists of two novel key components: (i) Pivotal inversion for diffusion models. While current methods aim at mapping random noise samples to a single input image, we use a single pivotal noise vector for each timestamp and optimize around it. We demonstrate that a direct inversion is inadequate on its own, but does provide a good anchor for our optimization. (ii) NULL-text optimization, where we only modify the unconditional textual embedding that is used for classifier-free guidance, rather than the input text embedding. This allows for keeping both the model weights and the conditional embedding intact and hence enables applying prompt-based editing while avoiding the cumbersome tuning of the model's weights. Our Null-text inversion, based on the publicly available Stable Diffusion model, is extensively evaluated on a variety of images and prompt editing, showing high-fidelity editing of real images.

Results

Task	Dataset	Metric	Value	Model
Image Generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt

Abstract

Results

Task	Dataset	Metric	Value	Model
Image Generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
Image Generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
Text-to-Image Generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
10-shot image generation	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Background LPIPS	60.67	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Background PSNR	27.03	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	CLIPSIM	24.75	Null-Text Inversion+Prompt-to-Prompt
1 Image, 2*2 Stitchi	PIE-Bench	Structure Distance	13.44	Null-Text Inversion+Prompt-to-Prompt

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Abstract

Results

Related Papers

Null-text Inversion for Editing Real Images using Guided Diffusion Models

Abstract

Results

Related Papers