TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Zero-shot Image-to-Image Translation

Zero-shot Image-to-Image Translation

Gaurav Parmar, Krishna Kumar Singh, Richard Zhang, Yijun Li, Jingwan Lu, Jun-Yan Zhu

2023-02-06TranslationText-based Image EditingImage-to-Image Translation
PaperPDFCode(official)Code

Abstract

Large-scale text-to-image generative models have shown their remarkable ability to synthesize diverse and high-quality images. However, it is still challenging to directly apply these models for editing real images for two reasons. First, it is hard for users to come up with a perfect text prompt that accurately describes every visual detail in the input image. Second, while existing models can introduce desirable changes in certain regions, they often dramatically alter the input content and introduce unexpected changes in unwanted regions. In this work, we propose pix2pix-zero, an image-to-image translation method that can preserve the content of the original image without manual prompting. We first automatically discover editing directions that reflect desired edits in the text embedding space. To preserve the general content structure after editing, we further propose cross-attention guidance, which aims to retain the cross-attention maps of the input image throughout the diffusion process. In addition, our method does not need additional training for these edits and can directly use the existing pre-trained text-to-image diffusion model. We conduct extensive experiments and show that our method outperforms existing and concurrent works for both real and synthetic image editing.

Results

TaskDatasetMetricValueModel
Image GenerationPIE-BenchBackground LPIPS172.22DDIM Inversion+Pix2Pix-Zero
Image GenerationPIE-BenchBackground PSNR20.44DDIM Inversion+Pix2Pix-Zero
Image GenerationPIE-BenchCLIPSIM22.8DDIM Inversion+Pix2Pix-Zero
Image GenerationPIE-BenchStructure Distance61.68DDIM Inversion+Pix2Pix-Zero
Text-to-Image GenerationPIE-BenchBackground LPIPS172.22DDIM Inversion+Pix2Pix-Zero
Text-to-Image GenerationPIE-BenchBackground PSNR20.44DDIM Inversion+Pix2Pix-Zero
Text-to-Image GenerationPIE-BenchCLIPSIM22.8DDIM Inversion+Pix2Pix-Zero
Text-to-Image GenerationPIE-BenchStructure Distance61.68DDIM Inversion+Pix2Pix-Zero
10-shot image generationPIE-BenchBackground LPIPS172.22DDIM Inversion+Pix2Pix-Zero
10-shot image generationPIE-BenchBackground PSNR20.44DDIM Inversion+Pix2Pix-Zero
10-shot image generationPIE-BenchCLIPSIM22.8DDIM Inversion+Pix2Pix-Zero
10-shot image generationPIE-BenchStructure Distance61.68DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 StitchiPIE-BenchBackground LPIPS172.22DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 StitchiPIE-BenchBackground PSNR20.44DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 StitchiPIE-BenchCLIPSIM22.8DDIM Inversion+Pix2Pix-Zero
1 Image, 2*2 StitchiPIE-BenchStructure Distance61.68DDIM Inversion+Pix2Pix-Zero

Related Papers

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Function-to-Style Guidance of LLMs for Code Translation2025-07-15Speak2Sign3D: A Multi-modal Pipeline for English Speech to American Sign Language Animation2025-07-09Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings2025-07-09Unconditional Diffusion for Generative Sequential Recommendation2025-07-08GRAFT: A Graph-based Flow-aware Agentic Framework for Document-level Machine Translation2025-07-04TransLaw: Benchmarking Large Language Models in Multi-Agent Simulation of the Collaborative Translation2025-07-01