TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LaCon: Late-Constraint Diffusion for Steerable Guided Imag...

LaCon: Late-Constraint Diffusion for Steerable Guided Image Synthesis

Chang Liu, Rui Li, Kaidong Zhang, Xin Luo, Dong Liu

2023-05-19Text-to-Image GenerationConditional Text-to-Image SynthesisImage GenerationConditional Image Generation
PaperPDFCode(official)

Abstract

Diffusion models have demonstrated impressive abilities in generating photo-realistic and creative images. To offer more controllability for the generation process, existing studies, termed as early-constraint methods in this paper, leverage extra conditions and incorporate them into pre-trained diffusion models. Particularly, some of them adopt condition-specific modules to handle conditions separately, where they struggle to generalize across other conditions. Although follow-up studies present unified solutions to solve the generalization problem, they also require extra resources to implement, e.g., additional inputs or parameter optimization, where more flexible and efficient solutions are expected to perform steerable guided image synthesis. In this paper, we present an alternative paradigm, namely Late-Constraint Diffusion (LaCon), to simultaneously integrate various conditions into pre-trained diffusion models. Specifically, LaCon establishes an alignment between the external condition and the internal features of diffusion models, and utilizes the alignment to incorporate the target condition, guiding the sampling process to produce tailored results. Experimental results on COCO dataset illustrate the effectiveness and superior generalization capability of LaCon under various conditions and settings. Ablation studies investigate the functionalities of different components in LaCon, and illustrate its great potential to serve as an efficient solution to offer flexible controllability for diffusion models.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO 2017 valFID20.27LCDG
Image GenerationCOCO 2017 valCLIP Score0.258LCDG (Color, evaluated under image palette)
Image GenerationCOCO 2017 valFID20.61LCDG (Color, evaluated under image palette)
Image GenerationCOCO 2017 valCLIP Score0.2617LCDG (Mask)
Image GenerationCOCO 2017 valFID20.94LCDG (Mask)
Image GenerationCOCO 2017 valFID21.02LCDG (Edge)
Image GenerationCOCO 2017 valCLIP Score0.258T2I-Adapter (Sketch)
Image GenerationCOCO 2017 valFID21.72T2I-Adapter (Sketch)
Image GenerationCOCO 2017 valCLIP Score0.2613T2I-Adapter (Color, evaluated under image palette)
Image GenerationCOCO 2017 valFID26.54T2I-Adapter (Color, evaluated under image palette)
Image GenerationCOCO 2017 valCLIP Score0.2673SD (text)
Image GenerationCOCO 2017 valFID27.99SD (text)
Image GenerationCOCO 2017 valCLIP Score0.2525ControlNet (HED Edge)
Image GenerationCOCO 2017 valFID28.09ControlNet (HED Edge)
Image GenerationCOCO 2017 valFID30.84T2I-Adapter (Color, evaluated under color stroke)
Image GenerationCOCO 2017 valCLIP Score0.2257SD using SDEdit (evaluated under color stroke)
Image GenerationCOCO 2017 valFID32.93SD using SDEdit (evaluated under color stroke)
Image GenerationCOCO 2017 valFID71.16SD using SDEdit
Image GenerationCOCO 2017 valCLIP Score0.2138SD using SDEdit (evaluated under image palette)
Text-to-Image GenerationCOCO 2017 valFID20.27LCDG
Text-to-Image GenerationCOCO 2017 valCLIP Score0.258LCDG (Color, evaluated under image palette)
Text-to-Image GenerationCOCO 2017 valFID20.61LCDG (Color, evaluated under image palette)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2617LCDG (Mask)
Text-to-Image GenerationCOCO 2017 valFID20.94LCDG (Mask)
Text-to-Image GenerationCOCO 2017 valFID21.02LCDG (Edge)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.258T2I-Adapter (Sketch)
Text-to-Image GenerationCOCO 2017 valFID21.72T2I-Adapter (Sketch)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2613T2I-Adapter (Color, evaluated under image palette)
Text-to-Image GenerationCOCO 2017 valFID26.54T2I-Adapter (Color, evaluated under image palette)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2673SD (text)
Text-to-Image GenerationCOCO 2017 valFID27.99SD (text)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2525ControlNet (HED Edge)
Text-to-Image GenerationCOCO 2017 valFID28.09ControlNet (HED Edge)
Text-to-Image GenerationCOCO 2017 valFID30.84T2I-Adapter (Color, evaluated under color stroke)
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2257SD using SDEdit (evaluated under color stroke)
Text-to-Image GenerationCOCO 2017 valFID32.93SD using SDEdit (evaluated under color stroke)
Text-to-Image GenerationCOCO 2017 valFID71.16SD using SDEdit
Text-to-Image GenerationCOCO 2017 valCLIP Score0.2138SD using SDEdit (evaluated under image palette)
10-shot image generationCOCO 2017 valFID20.27LCDG
10-shot image generationCOCO 2017 valCLIP Score0.258LCDG (Color, evaluated under image palette)
10-shot image generationCOCO 2017 valFID20.61LCDG (Color, evaluated under image palette)
10-shot image generationCOCO 2017 valCLIP Score0.2617LCDG (Mask)
10-shot image generationCOCO 2017 valFID20.94LCDG (Mask)
10-shot image generationCOCO 2017 valFID21.02LCDG (Edge)
10-shot image generationCOCO 2017 valCLIP Score0.258T2I-Adapter (Sketch)
10-shot image generationCOCO 2017 valFID21.72T2I-Adapter (Sketch)
10-shot image generationCOCO 2017 valCLIP Score0.2613T2I-Adapter (Color, evaluated under image palette)
10-shot image generationCOCO 2017 valFID26.54T2I-Adapter (Color, evaluated under image palette)
10-shot image generationCOCO 2017 valCLIP Score0.2673SD (text)
10-shot image generationCOCO 2017 valFID27.99SD (text)
10-shot image generationCOCO 2017 valCLIP Score0.2525ControlNet (HED Edge)
10-shot image generationCOCO 2017 valFID28.09ControlNet (HED Edge)
10-shot image generationCOCO 2017 valFID30.84T2I-Adapter (Color, evaluated under color stroke)
10-shot image generationCOCO 2017 valCLIP Score0.2257SD using SDEdit (evaluated under color stroke)
10-shot image generationCOCO 2017 valFID32.93SD using SDEdit (evaluated under color stroke)
10-shot image generationCOCO 2017 valFID71.16SD using SDEdit
10-shot image generationCOCO 2017 valCLIP Score0.2138SD using SDEdit (evaluated under image palette)
1 Image, 2*2 StitchiCOCO 2017 valFID20.27LCDG
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.258LCDG (Color, evaluated under image palette)
1 Image, 2*2 StitchiCOCO 2017 valFID20.61LCDG (Color, evaluated under image palette)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2617LCDG (Mask)
1 Image, 2*2 StitchiCOCO 2017 valFID20.94LCDG (Mask)
1 Image, 2*2 StitchiCOCO 2017 valFID21.02LCDG (Edge)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.258T2I-Adapter (Sketch)
1 Image, 2*2 StitchiCOCO 2017 valFID21.72T2I-Adapter (Sketch)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2613T2I-Adapter (Color, evaluated under image palette)
1 Image, 2*2 StitchiCOCO 2017 valFID26.54T2I-Adapter (Color, evaluated under image palette)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2673SD (text)
1 Image, 2*2 StitchiCOCO 2017 valFID27.99SD (text)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2525ControlNet (HED Edge)
1 Image, 2*2 StitchiCOCO 2017 valFID28.09ControlNet (HED Edge)
1 Image, 2*2 StitchiCOCO 2017 valFID30.84T2I-Adapter (Color, evaluated under color stroke)
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2257SD using SDEdit (evaluated under color stroke)
1 Image, 2*2 StitchiCOCO 2017 valFID32.93SD using SDEdit (evaluated under color stroke)
1 Image, 2*2 StitchiCOCO 2017 valFID71.16SD using SDEdit
1 Image, 2*2 StitchiCOCO 2017 valCLIP Score0.2138SD using SDEdit (evaluated under image palette)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15