TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/StyleGAN-T: Unlocking the Power of GANs for Fast Large-Sca...

StyleGAN-T: Unlocking the Power of GANs for Fast Large-Scale Text-to-Image Synthesis

Axel Sauer, Tero Karras, Samuli Laine, Andreas Geiger, Timo Aila

2023-01-23Text-to-Image GenerationImage Generation
PaperPDFCode(official)

Abstract

Text-to-image synthesis has recently seen significant progress thanks to large pretrained language models, large-scale training data, and the introduction of scalable model families such as diffusion and autoregressive models. However, the best-performing models require iterative evaluation to generate a single sample. In contrast, generative adversarial networks (GANs) only need a single forward pass. They are thus much faster, but they currently remain far behind the state-of-the-art in large-scale text-to-image synthesis. This paper aims to identify the necessary steps to regain competitiveness. Our proposed model, StyleGAN-T, addresses the specific requirements of large-scale text-to-image synthesis, such as large capacity, stable training on diverse datasets, strong text alignment, and controllable variation vs. text alignment tradeoff. StyleGAN-T significantly improves over previous GANs and outperforms distilled diffusion models - the previous state-of-the-art in fast text-to-image synthesis - in terms of sample quality and speed.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID7.3StyleGAN-T (Zero-shot, 64x64)
Image GenerationCOCO (Common Objects in Context)FID13.9StyleGAN-T (Zero-shot, 256x256)
Text-to-Image GenerationCOCO (Common Objects in Context)FID7.3StyleGAN-T (Zero-shot, 64x64)
Text-to-Image GenerationCOCO (Common Objects in Context)FID13.9StyleGAN-T (Zero-shot, 256x256)
10-shot image generationCOCO (Common Objects in Context)FID7.3StyleGAN-T (Zero-shot, 64x64)
10-shot image generationCOCO (Common Objects in Context)FID13.9StyleGAN-T (Zero-shot, 256x256)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID7.3StyleGAN-T (Zero-shot, 64x64)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID13.9StyleGAN-T (Zero-shot, 256x256)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15