TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Shifted Diffusion for Text-to-image Generation

Shifted Diffusion for Text-to-image Generation

Yufan Zhou, Bingchen Liu, Yizhe Zhu, Xiao Yang, Changyou Chen, Jinhui Xu

2022-11-24CVPR 2023 1Text-to-Image GenerationText to Image GenerationImage Generation
PaperPDFCode(official)

Abstract

We present Corgi, a novel method for text-to-image generation. Corgi is based on our proposed shifted diffusion model, which achieves better image embedding generation from input text. Unlike the baseline diffusion model used in DALL-E 2, our method seamlessly encodes prior knowledge of the pre-trained CLIP model in its diffusion process by designing a new initialization distribution and a new transition step of the diffusion. Compared to the strong DALL-E 2 baseline, our method performs better in generating image embedding from the text in terms of both efficiency and effectiveness, resulting in better text-to-image generation. Extensive large-scale experiments are conducted and evaluated in terms of both quantitative measures and human evaluation, indicating a stronger generation ability of our method compared to existing ones. Furthermore, our model enables semi-supervised and language-free training for text-to-image generation, where only part or none of the images in the training dataset have an associated caption. Trained with only 1.7% of the images being captioned, our semi-supervised model obtains FID results comparable to DALL-E 2 on zero-shot text-to-image generation evaluated on MS-COCO. Corgi also achieves new state-of-the-art results across different datasets on downstream language-free text-to-image generation tasks, outperforming the previous method, Lafite, by a large margin.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID10.6Corgi-Semi
Image GenerationCOCO (Common Objects in Context)FID10.88Corgi
Image GenerationMulti-Modal-CelebA-HQFID19.74Corgi
Text-to-Image GenerationCOCO (Common Objects in Context)FID10.6Corgi-Semi
Text-to-Image GenerationCOCO (Common Objects in Context)FID10.88Corgi
Text-to-Image GenerationMulti-Modal-CelebA-HQFID19.74Corgi
10-shot image generationCOCO (Common Objects in Context)FID10.6Corgi-Semi
10-shot image generationCOCO (Common Objects in Context)FID10.88Corgi
10-shot image generationMulti-Modal-CelebA-HQFID19.74Corgi
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID10.6Corgi-Semi
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID10.88Corgi
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQFID19.74Corgi

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15