CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

2021-05-26NeurIPS 2021 12Super-Resolution Text-to-Image Generation Text to Image Generation Image Generation

Paper PDF Code Code Code Code(official)

Abstract

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E.

Results

Task	Dataset	Metric	Value	Model
Image Generation	COCO (Common Objects in Context)	FID	27.1	CogView
Image Generation	COCO (Common Objects in Context)	FID-1	19.4	CogView
Image Generation	COCO (Common Objects in Context)	FID-2	13.9	CogView
Image Generation	COCO (Common Objects in Context)	FID-4	19.4	CogView
Image Generation	COCO (Common Objects in Context)	FID-8	23.6	CogView
Image Generation	COCO (Common Objects in Context)	Inception score	18.2	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	FID	27.1	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	FID-1	19.4	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	FID-2	13.9	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	FID-4	19.4	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	FID-8	23.6	CogView
Text-to-Image Generation	COCO (Common Objects in Context)	Inception score	18.2	CogView
10-shot image generation	COCO (Common Objects in Context)	FID	27.1	CogView
10-shot image generation	COCO (Common Objects in Context)	FID-1	19.4	CogView
10-shot image generation	COCO (Common Objects in Context)	FID-2	13.9	CogView
10-shot image generation	COCO (Common Objects in Context)	FID-4	19.4	CogView
10-shot image generation	COCO (Common Objects in Context)	FID-8	23.6	CogView
10-shot image generation	COCO (Common Objects in Context)	Inception score	18.2	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID	27.1	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID-1	19.4	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID-2	13.9	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID-4	19.4	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	FID-8	23.6	CogView
1 Image, 2*2 Stitchi	COCO (Common Objects in Context)	Inception score	18.2	CogView

CogView: Mastering Text-to-Image Generation via Transformers

Abstract

Results

Related Papers

CogView: Mastering Text-to-Image Generation via Transformers

Abstract

Results

Related Papers