Zero-shot Image-to-Image Translation

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu

2023-02-06Translation Text-based Image Editing Image-to-Image Translation

Abstract

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

Results

Task	Dataset	Metric	Value	Model
Image Generation	PIE-Bench	Background LPIPS	172.22	DDIM Inversion+Pix2Pix-Zero
Image Generation	PIE-Bench	Background PSNR	20.44	DDIM Inversion+Pix2Pix-Zero
Image Generation	PIE-Bench	CLIPSIM	22.8	DDIM Inversion+Pix2Pix-Zero
Image Generation	PIE-Bench	Structure Distance	61.68	DDIM Inversion+Pix2Pix-Zero
Text-to-Image Generation	PIE-Bench	Background LPIPS	172.22	DDIM Inversion+Pix2Pix-Zero
Text-to-Image Generation	PIE-Bench	Background PSNR	20.44	DDIM Inversion+Pix2Pix-Zero
Text-to-Image Generation	PIE-Bench	CLIPSIM	22.8	DDIM Inversion+Pix2Pix-Zero
Text-to-Image Generation	PIE-Bench	Structure Distance	61.68	DDIM Inversion+Pix2Pix-Zero
10-shot image generation	PIE-Bench	Background LPIPS	172.22	DDIM Inversion+Pix2Pix-Zero
10-shot image generation	PIE-Bench	Background PSNR	20.44	DDIM Inversion+Pix2Pix-Zero
10-shot image generation	PIE-Bench	CLIPSIM	22.8	DDIM Inversion+Pix2Pix-Zero
10-shot image generation	PIE-Bench	Structure Distance	61.68	DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 Stitchi	PIE-Bench	Background LPIPS	172.22	DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 Stitchi	PIE-Bench	Background PSNR	20.44	DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 Stitchi	PIE-Bench	CLIPSIM	22.8	DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 Stitchi	PIE-Bench	Structure Distance	61.68	DDIM Inversion+Pix2Pix-Zero

Zero-shot Image-to-Image Translation

Abstract

Results

Related Papers

Zero-shot Image-to-Image Translation

Abstract

Results

Related Papers