TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MaskGIT: Masked Generative Image Transformer

MaskGIT: Masked Generative Image Transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

2022-02-08CVPR 2022 1Text-to-Image GenerationImage ReconstructionImage OutpaintingImage GenerationImage Manipulation
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCode(official)

Abstract

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation. Our experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x. Besides, we illustrate that MaskGIT can be easily extended to various image editing tasks, such as inpainting, extrapolation, and image manipulation.

Results

TaskDatasetMetricValueModel
Image GenerationImageNet 512x512FID4.46MaskGIT (a=0.05)
Image GenerationImageNet 512x512Inception score342MaskGIT (a=0.05)
Image GenerationImageNet 512x512FID7.32MaskGIT
Image GenerationImageNet 512x512Inception score156MaskGIT
Image GenerationImageNet 256x256FID4.02MaskGIT (a=0.05)
Image GenerationImageNet 256x256FID6.18MaskGIT
Image GenerationLHQCBlock-FID24.33MaskGIT
Image ReconstructionImageNetFID2.28MaskGIT-VQGAN (16x16)
Text-to-Image GenerationLHQCBlock-FID24.33MaskGIT
Image OutpaintingLHQCBlock-FID (Right Extend)14.68MaskGIT
Image OutpaintingLHQCBlock-FID (Down Extend)25.57MaskGIT
Image OutpaintingLHQCBlock-FID (Left Extend)14.81MaskGIT
Image OutpaintingLHQCBlock-FID (Up Extend)25.38MaskGIT
10-shot image generationLHQCBlock-FID24.33MaskGIT
1 Image, 2*2 StitchiLHQCBlock-FID24.33MaskGIT

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Beyond Fully Supervised Pixel Annotations: Scribble-Driven Weakly-Supervised Framework for Image Manipulation Localization2025-07-17FADE: Adversarial Concept Erasure in Flow Models2025-07-16CharaConsist: Fine-Grained Consistent Character Generation2025-07-15