TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DF-GAN: A Simple and Effective Baseline for Text-to-Image ...

DF-GAN: A Simple and Effective Baseline for Text-to-Image Synthesis

Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu

2020-08-13CVPR 2022 1Text-to-Image GenerationText MatchingImage Generation
PaperPDFCodeCodeCode(official)

Abstract

Synthesizing high-quality realistic images from text descriptions is a challenging task. Existing text-to-image Generative Adversarial Networks generally employ a stacked architecture as the backbone yet still remain three flaws. First, the stacked architecture introduces the entanglements between generators of different image scales. Second, existing studies prefer to apply and fix extra networks in adversarial learning for text-image semantic consistency, which limits the supervision capability of these networks. Third, the cross-modal attention-based text-image fusion that widely adopted by previous works is limited on several special image scales because of the computational cost. To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN). To be specific, we propose: (i) a novel one-stage text-to-image backbone that directly synthesizes high-resolution images without entanglements between different generators, (ii) a novel Target-Aware Discriminator composed of Matching-Aware Gradient Penalty and One-Way Output, which enhances the text-image semantic consistency without introducing extra networks, (iii) a novel deep text-image fusion block, which deepens the fusion process to make a full fusion between text and visual features. Compared with current state-of-the-art methods, our proposed DF-GAN is simpler but more efficient to synthesize realistic and text-matching images and achieves better performance on widely used datasets.

Results

TaskDatasetMetricValueModel
Image GenerationCUBInception score4.86DF-GAN
Image GenerationMulti-Modal-CelebA-HQAcc17.3DFGAN
Image GenerationMulti-Modal-CelebA-HQFID137.6DFGAN
Image GenerationMulti-Modal-CelebA-HQLPIPS0.581DFGAN
Image GenerationMulti-Modal-CelebA-HQReal14.5DFGAN
Text-to-Image GenerationCUBInception score4.86DF-GAN
Text-to-Image GenerationMulti-Modal-CelebA-HQAcc17.3DFGAN
Text-to-Image GenerationMulti-Modal-CelebA-HQFID137.6DFGAN
Text-to-Image GenerationMulti-Modal-CelebA-HQLPIPS0.581DFGAN
Text-to-Image GenerationMulti-Modal-CelebA-HQReal14.5DFGAN
10-shot image generationMulti-Modal-CelebA-HQAcc17.3DFGAN
10-shot image generationMulti-Modal-CelebA-HQFID137.6DFGAN
10-shot image generationMulti-Modal-CelebA-HQLPIPS0.581DFGAN
10-shot image generationMulti-Modal-CelebA-HQReal14.5DFGAN
10-shot image generationCUBInception score4.86DF-GAN
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQAcc17.3DFGAN
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQFID137.6DFGAN
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQLPIPS0.581DFGAN
1 Image, 2*2 StitchiMulti-Modal-CelebA-HQReal14.5DFGAN
1 Image, 2*2 StitchiCUBInception score4.86DF-GAN

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15