TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Plug-and-Play Diffusion Features for Text-Driven Image-to-...

Plug-and-Play Diffusion Features for Text-Driven Image-to-Image Translation

Narek Tumanyan, Michal Geyer, Shai Bagon, Tali Dekel

2022-11-22CVPR 2023 1TranslationImage GenerationText-based Image EditingImage-to-Image Translation
PaperPDFCodeCodeCodeCode(official)

Abstract

Large-scale text-to-image generative models have been a revolutionary breakthrough in the evolution of generative AI, allowing us to synthesize diverse images that convey highly complex visual concepts. However, a pivotal challenge in leveraging such models for real-world content creation tasks is providing users with control over the generated content. In this paper, we present a new framework that takes text-to-image synthesis to the realm of image-to-image translation -- given a guidance image and a target text prompt, our method harnesses the power of a pre-trained text-to-image diffusion model to generate a new image that complies with the target text, while preserving the semantic layout of the source image. Specifically, we observe and empirically demonstrate that fine-grained control over the generated structure can be achieved by manipulating spatial features and their self-attention inside the model. This results in a simple and effective approach, where features extracted from the guidance image are directly injected into the generation process of the target image, requiring no training or fine-tuning and applicable for both real or generated guidance images. We demonstrate high-quality results on versatile text-guided image translation tasks, including translating sketches, rough drawings and animations into realistic images, changing of the class and appearance of objects in a given image, and modifications of global qualities such as lighting and color.

Results

TaskDatasetMetricValueModel
Image GenerationPIE-BenchBackground LPIPS113.46DDIM Inversion+Plug-and-Play
Image GenerationPIE-BenchBackground PSNR22.28DDIM Inversion+Plug-and-Play
Image GenerationPIE-BenchCLIPSIM25.41DDIM Inversion+Plug-and-Play
Image GenerationPIE-BenchStructure Distance28.22DDIM Inversion+Plug-and-Play
Text-to-Image GenerationPIE-BenchBackground LPIPS113.46DDIM Inversion+Plug-and-Play
Text-to-Image GenerationPIE-BenchBackground PSNR22.28DDIM Inversion+Plug-and-Play
Text-to-Image GenerationPIE-BenchCLIPSIM25.41DDIM Inversion+Plug-and-Play
Text-to-Image GenerationPIE-BenchStructure Distance28.22DDIM Inversion+Plug-and-Play
10-shot image generationPIE-BenchBackground LPIPS113.46DDIM Inversion+Plug-and-Play
10-shot image generationPIE-BenchBackground PSNR22.28DDIM Inversion+Plug-and-Play
10-shot image generationPIE-BenchCLIPSIM25.41DDIM Inversion+Plug-and-Play
10-shot image generationPIE-BenchStructure Distance28.22DDIM Inversion+Plug-and-Play
1 Image, 2*2 StitchiPIE-BenchBackground LPIPS113.46DDIM Inversion+Plug-and-Play
1 Image, 2*2 StitchiPIE-BenchBackground PSNR22.28DDIM Inversion+Plug-and-Play
1 Image, 2*2 StitchiPIE-BenchCLIPSIM25.41DDIM Inversion+Plug-and-Play
1 Image, 2*2 StitchiPIE-BenchStructure Distance28.22DDIM Inversion+Plug-and-Play

Related Papers

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16