TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MasaCtrl: Tuning-Free Mutual Self-Attention Control for Co...

MasaCtrl: Tuning-Free Mutual Self-Attention Control for Consistent Image Synthesis and Editing

Mingdeng Cao, Xintao Wang, Zhongang Qi, Ying Shan, XiaoHu Qie, Yinqiang Zheng

2023-04-17ICCV 2023 1Text-to-Image GenerationText to Image GenerationImage GenerationText-based Image Editing
PaperPDFCode(official)CodeCodeCode

Abstract

Despite the success in large-scale text-to-image generation and text-conditioned image editing, existing methods still struggle to produce consistent generation and editing results. For example, generation approaches usually fail to synthesize multiple images of the same objects/characters but with different views or poses. Meanwhile, existing editing methods either fail to achieve effective complex non-rigid editing while maintaining the overall textures and identity, or require time-consuming fine-tuning to capture the image-specific appearance. In this paper, we develop MasaCtrl, a tuning-free method to achieve consistent image generation and complex non-rigid image editing simultaneously. Specifically, MasaCtrl converts existing self-attention in diffusion models into mutual self-attention, so that it can query correlated local contents and textures from source images for consistency. To further alleviate the query confusion between foreground and background, we propose a mask-guided mutual self-attention strategy, where the mask can be easily extracted from the cross-attention maps. Extensive experiments show that the proposed MasaCtrl can produce impressive results in both consistent image generation and complex non-rigid real image editing.

Results

TaskDatasetMetricValueModel
Image GenerationPIE-BenchBackground LPIPS106.62DDIM Inversion+MasaCtrl
Image GenerationPIE-BenchBackground PSNR22.17DDIM Inversion+MasaCtrl
Image GenerationPIE-BenchCLIPSIM23.96DDIM Inversion+MasaCtrl
Image GenerationPIE-BenchStructure Distance28.38DDIM Inversion+MasaCtrl
Text-to-Image GenerationPIE-BenchBackground LPIPS106.62DDIM Inversion+MasaCtrl
Text-to-Image GenerationPIE-BenchBackground PSNR22.17DDIM Inversion+MasaCtrl
Text-to-Image GenerationPIE-BenchCLIPSIM23.96DDIM Inversion+MasaCtrl
Text-to-Image GenerationPIE-BenchStructure Distance28.38DDIM Inversion+MasaCtrl
10-shot image generationPIE-BenchBackground LPIPS106.62DDIM Inversion+MasaCtrl
10-shot image generationPIE-BenchBackground PSNR22.17DDIM Inversion+MasaCtrl
10-shot image generationPIE-BenchCLIPSIM23.96DDIM Inversion+MasaCtrl
10-shot image generationPIE-BenchStructure Distance28.38DDIM Inversion+MasaCtrl
1 Image, 2*2 StitchiPIE-BenchBackground LPIPS106.62DDIM Inversion+MasaCtrl
1 Image, 2*2 StitchiPIE-BenchBackground PSNR22.17DDIM Inversion+MasaCtrl
1 Image, 2*2 StitchiPIE-BenchCLIPSIM23.96DDIM Inversion+MasaCtrl
1 Image, 2*2 StitchiPIE-BenchStructure Distance28.38DDIM Inversion+MasaCtrl

Related Papers

NoHumansRequired: Autonomous High-Quality Image Editing Triplet Mining2025-07-18fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15