MaskGIT: Masked Generative Image Transformer

Huiwen Chang, Han Zhang, Lu Jiang, Ce Liu, William T. Freeman

2022-02-08CVPR 2022 1Text-to-Image Generation Image Reconstruction Image Outpainting Image Generation Image Manipulation

Paper PDF Code Code Code Code Code Code Code Code Code(official)

Abstract

Generative transformers have experienced rapid popularity growth in the computer vision community in synthesizing high-fidelity and high-resolution images. The best generative transformer models so far, however, still treat an image naively as a sequence of tokens, and decode an image sequentially following the raster scan ordering (i.e. line-by-line). We find this strategy neither optimal nor efficient. This paper proposes a novel image synthesis paradigm using a bidirectional transformer decoder, which we term MaskGIT. During training, MaskGIT learns to predict randomly masked tokens by attending to tokens in all directions. At inference time, the model begins with generating all tokens of an image simultaneously, and then refines the image iteratively conditioned on the previous generation. Our experiments demonstrate that MaskGIT significantly outperforms the state-of-the-art transformer model on the ImageNet dataset, and accelerates autoregressive decoding by up to 64x. Besides, we illustrate that MaskGIT can be easily extended to various image editing tasks, such as inpainting, extrapolation, and image manipulation.

Results

Task	Dataset	Metric	Value	Model
Image Generation	ImageNet 512x512	FID	4.46	MaskGIT (a=0.05)
Image Generation	ImageNet 512x512	Inception score	342	MaskGIT (a=0.05)
Image Generation	ImageNet 512x512	FID	7.32	MaskGIT
Image Generation	ImageNet 512x512	Inception score	156	MaskGIT
Image Generation	ImageNet 256x256	FID	4.02	MaskGIT (a=0.05)
Image Generation	ImageNet 256x256	FID	6.18	MaskGIT
Image Generation	LHQC	Block-FID	24.33	MaskGIT
Image Reconstruction	ImageNet	FID	2.28	MaskGIT-VQGAN (16x16)
Text-to-Image Generation	LHQC	Block-FID	24.33	MaskGIT
Image Outpainting	LHQC	Block-FID (Right Extend)	14.68	MaskGIT
Image Outpainting	LHQC	Block-FID (Down Extend)	25.57	MaskGIT
Image Outpainting	LHQC	Block-FID (Left Extend)	14.81	MaskGIT
Image Outpainting	LHQC	Block-FID (Up Extend)	25.38	MaskGIT
10-shot image generation	LHQC	Block-FID	24.33	MaskGIT
1 Image, 2*2 Stitchi	LHQC	Block-FID	24.33	MaskGIT

MaskGIT: Masked Generative Image Transformer

Abstract

Results

Related Papers

MaskGIT: Masked Generative Image Transformer

Abstract

Results

Related Papers