TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Scaling up GANs for Text-to-Image Synthesis

Scaling up GANs for Text-to-Image Synthesis

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

2023-03-09CVPR 2023 1Text-to-Image GenerationImage Generation
PaperPDFCode

Abstract

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.

Results

TaskDatasetMetricValueModel
Image GenerationImageNet 256x256FID3.45GigaGAN
Image GenerationCOCO (Common Objects in Context)FID7.28GigaGAN (Zero-shot, 64x64)
Image GenerationCOCO (Common Objects in Context)FID9.09GigaGAN (Zero-shot, 256x256)
Text-to-Image GenerationCOCO (Common Objects in Context)FID7.28GigaGAN (Zero-shot, 64x64)
Text-to-Image GenerationCOCO (Common Objects in Context)FID9.09GigaGAN (Zero-shot, 256x256)
10-shot image generationCOCO (Common Objects in Context)FID7.28GigaGAN (Zero-shot, 64x64)
10-shot image generationCOCO (Common Objects in Context)FID9.09GigaGAN (Zero-shot, 256x256)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID7.28GigaGAN (Zero-shot, 64x64)
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID9.09GigaGAN (Zero-shot, 256x256)

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15