Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park
The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Image Generation | ImageNet 256x256 | FID | 3.45 | GigaGAN |
| Image Generation | COCO (Common Objects in Context) | FID | 7.28 | GigaGAN (Zero-shot, 64x64) |
| Image Generation | COCO (Common Objects in Context) | FID | 9.09 | GigaGAN (Zero-shot, 256x256) |
| Text-to-Image Generation | COCO (Common Objects in Context) | FID | 7.28 | GigaGAN (Zero-shot, 64x64) |
| Text-to-Image Generation | COCO (Common Objects in Context) | FID | 9.09 | GigaGAN (Zero-shot, 256x256) |
| 10-shot image generation | COCO (Common Objects in Context) | FID | 7.28 | GigaGAN (Zero-shot, 64x64) |
| 10-shot image generation | COCO (Common Objects in Context) | FID | 9.09 | GigaGAN (Zero-shot, 256x256) |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | FID | 7.28 | GigaGAN (Zero-shot, 64x64) |
| 1 Image, 2*2 Stitchi | COCO (Common Objects in Context) | FID | 9.09 | GigaGAN (Zero-shot, 256x256) |