MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia, Wei Yin, Xiaotao Hu, Jiaxin Guo, Xiaoyang Guo, Qian Zhang, Xiao-Xiao Long, Ping Tan

2025-07-10Quantization 2k

Abstract

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental models that compress continuous visual data into discrete tokens. Existing methods have tried to improve the quantization strategy for better reconstruction quality, however, there still exists a large gap between VQ-VAEs and VAEs. To narrow this gap, we propose MGVQ, a novel method to augment the representation capability of discrete codebooks, facilitating easier optimization for codebooks and minimizing information loss, thereby enhancing reconstruction quality. Specifically, we propose to retain the latent dimension to preserve encoded features and incorporate a set of sub-codebooks for quantization. Furthermore, we construct comprehensive zero-shot benchmarks featuring resolutions of 512p and 2k to evaluate the reconstruction performance of existing methods rigorously. MGVQ achieves the state-of-the-art performance on both ImageNet and 8 zero-shot benchmarks across all VQ-VAEs. Notably, compared with SD-VAE, we outperform them on ImageNet significantly, with rFID 0.49 v.s. 0.91, and achieve superior PSNR on all zero-shot benchmarks. These results highlight the superiority of MGVQ in reconstruction and pave the way for preserving fidelity in HD image processing tasks. Code will be publicly available at https://github.com/MKJia/MGVQ.

Results

Task	Dataset	Metric	Value	Model
Image Generation	ImageNet 256x256	FID	3.02	MGVQ
Image Generation	ImageNet 256x256	Inception score	294.1	MGVQ
Image Reconstruction	Ultra-High Resolution Image Reconstruction Benchmark	LPIPS	0.092	MGVQ (16x16x4)
Image Reconstruction	Ultra-High Resolution Image Reconstruction Benchmark	PSNR	28.27	MGVQ (16x16x4)
Image Reconstruction	Ultra-High Resolution Image Reconstruction Benchmark	SSIM	0.844	MGVQ (16x16x4)
Image Reconstruction	Ultra-High Resolution Image Reconstruction Benchmark	rFID	1.59	MGVQ (16x16x4)
Image Reconstruction	ImageNet	FID	0.49	MGVQ (16x16x8)
Image Reconstruction	ImageNet	LPIPS	0.086	MGVQ (16x16x8)
Image Reconstruction	ImageNet	PSNR	24.7	MGVQ (16x16x8)
Image Reconstruction	ImageNet	SSIM	0.787	MGVQ (16x16x8)
Image Reconstruction	ImageNet	FID	0.64	MGVQ (16x16x4)
Image Reconstruction	ImageNet	LPIPS	0.11	MGVQ (16x16x4)
Image Reconstruction	ImageNet	PSNR	23.71	MGVQ (16x16x4)
Image Reconstruction	ImageNet	SSIM	0.755	MGVQ (16x16x4)

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Abstract

Results

Related Papers

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Abstract

Results

Related Papers