TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CogView: Mastering Text-to-Image Generation via Transformers

CogView: Mastering Text-to-Image Generation via Transformers

Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, Jie Tang

2021-05-26NeurIPS 2021 12Super-ResolutionText-to-Image GenerationText to Image GenerationImage Generation
PaperPDFCodeCodeCodeCode(official)

Abstract

Text-to-Image generation in the general domain has long been an open problem, which requires both a powerful generative model and cross-modal understanding. We propose CogView, a 4-billion-parameter Transformer with VQ-VAE tokenizer to advance this problem. We also demonstrate the finetuning strategies for various downstream tasks, e.g. style learning, super-resolution, text-image ranking and fashion design, and methods to stabilize pretraining, e.g. eliminating NaN losses. CogView achieves the state-of-the-art FID on the blurred MS COCO dataset, outperforming previous GAN-based models and a recent similar work DALL-E.

Results

TaskDatasetMetricValueModel
Image GenerationCOCO (Common Objects in Context)FID27.1CogView
Image GenerationCOCO (Common Objects in Context)FID-119.4CogView
Image GenerationCOCO (Common Objects in Context)FID-213.9CogView
Image GenerationCOCO (Common Objects in Context)FID-419.4CogView
Image GenerationCOCO (Common Objects in Context)FID-823.6CogView
Image GenerationCOCO (Common Objects in Context)Inception score18.2CogView
Text-to-Image GenerationCOCO (Common Objects in Context)FID27.1CogView
Text-to-Image GenerationCOCO (Common Objects in Context)FID-119.4CogView
Text-to-Image GenerationCOCO (Common Objects in Context)FID-213.9CogView
Text-to-Image GenerationCOCO (Common Objects in Context)FID-419.4CogView
Text-to-Image GenerationCOCO (Common Objects in Context)FID-823.6CogView
Text-to-Image GenerationCOCO (Common Objects in Context)Inception score18.2CogView
10-shot image generationCOCO (Common Objects in Context)FID27.1CogView
10-shot image generationCOCO (Common Objects in Context)FID-119.4CogView
10-shot image generationCOCO (Common Objects in Context)FID-213.9CogView
10-shot image generationCOCO (Common Objects in Context)FID-419.4CogView
10-shot image generationCOCO (Common Objects in Context)FID-823.6CogView
10-shot image generationCOCO (Common Objects in Context)Inception score18.2CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID27.1CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID-119.4CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID-213.9CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID-419.4CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)FID-823.6CogView
1 Image, 2*2 StitchiCOCO (Common Objects in Context)Inception score18.2CogView

Related Papers

SpectraLift: Physics-Guided Spectral-Inversion Network for Self-Supervised Hyperspectral Image Super-Resolution2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15