TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Long and Short Guidance in Score identity Distillation for...

Long and Short Guidance in Score identity Distillation for One-Step Text-to-Image Generation

Mingyuan Zhou, Zhendong Wang, Huangjie Zheng, Hai Huang

2024-06-03Text-to-Image GenerationText to Image GenerationImage Generation
PaperPDFCode(official)Code

Abstract

Diffusion-based text-to-image generation models trained on extensive text-image pairs have shown the capacity to generate photorealistic images consistent with textual descriptions. However, a significant limitation of these models is their slow sample generation, which requires iterative refinement through the same network. In this paper, we enhance Score identity Distillation (SiD) by developing long and short classifier-free guidance (LSG) to efficiently distill pretrained Stable Diffusion models without using real training data. SiD aims to optimize a model-based explicit score matching loss, utilizing a score-identity-based approximation alongside the proposed LSG for practical computation. By training exclusively with fake images synthesized with its one-step generator, SiD equipped with LSG rapidly improves FID and CLIP scores, achieving state-of-the-art FID performance while maintaining a competitive CLIP score. Specifically, its data-free distillation of Stable Diffusion 1.5 achieves a record low FID of 8.15 on the COCO-2014 validation set, with a CLIP score of 0.304 at an LSG scale of 1.5, and an FID of 9.56 with a CLIP score of 0.313 at an LSG scale of 2. Our code and distilled one-step text-to-image generators are available at https://github.com/mingyuanzhou/SiD-LSG.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
Image GenerationCOCO (Common Objects in Context)Zero shot FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
Text-to-Image GenerationCOCO (Common Objects in Context)FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
Text-to-Image GenerationCOCO (Common Objects in Context)Zero shot FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
10-shot image generationCOCO (Common Objects in Context)FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
10-shot image generationCOCO (Common Objects in Context)Zero shot FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID8.15SiD-LSG (Data-free distillation, zero-shot FID)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Zero shot FID8.15SiD-LSG (Data-free distillation, zero-shot FID)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15