Scaling up GANs for Text-to-Image Synthesis

Minguk Kang, Jun-Yan Zhu, Richard Zhang, Jaesik Park, Eli Shechtman, Sylvain Paris, Taesung Park

2023-03-09CVPR 2023 1Text-to-Image Generation Image Generation

Abstract

The recent success of text-to-image synthesis has taken the world by storm and captured the general public's imagination. From a technical standpoint, it also marked a drastic change in the favored architecture to design generative image models. GANs used to be the de facto choice, with techniques like StyleGAN. With DALL-E 2, auto-regressive and diffusion models became the new standard for large-scale generative models overnight. This rapid shift raises a fundamental question: can we scale up GANs to benefit from large datasets like LAION? We find that na\"Ively increasing the capacity of the StyleGAN architecture quickly becomes unstable. We introduce GigaGAN, a new GAN architecture that far exceeds this limit, demonstrating GANs as a viable option for text-to-image synthesis. GigaGAN offers three major advantages. First, it is orders of magnitude faster at inference time, taking only 0.13 seconds to synthesize a 512px image. Second, it can synthesize high-resolution images, for example, 16-megapixel pixels in 3.66 seconds. Finally, GigaGAN supports various latent space editing applications such as latent interpolation, style mixing, and vector arithmetic operations.

Results

Task	Dataset	Metric	Value	Model
Image Generation	ImageNet 256x256	FID	3.45	GigaGAN
Image Generation	COCO (Common Objects in Context)	FID	7.28	GigaGAN (Zero-shot, 64x64)
Image Generation	COCO (Common Objects in Context)	FID	9.09	GigaGAN (Zero-shot, 256x256)
Text-to-Image Generation	COCO (Common Objects in Context)	FID	7.28	GigaGAN (Zero-shot, 64x64)
Text-to-Image Generation	COCO (Common Objects in Context)	FID	9.09	GigaGAN (Zero-shot, 256x256)
10-shot image generation	COCO (Common Objects in Context)	FID	7.28	GigaGAN (Zero-shot, 64x64)
10-shot image generation	COCO (Common Objects in Context)	FID	9.09	GigaGAN (Zero-shot, 256x256)
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID	7.28	GigaGAN (Zero-shot, 64x64)
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID	9.09	GigaGAN (Zero-shot, 256x256)

Scaling up GANs for Text-to-Image Synthesis

Abstract

Results

Related Papers

Scaling up GANs for Text-to-Image Synthesis

Abstract

Results

Related Papers