TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer wit...

MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization

Mingkai Jia, Wei Yin, Xiaotao Hu, Jiaxin Guo, Xiaoyang Guo, Qian Zhang, Xiao-Xiao Long, Ping Tan

2025-07-10Quantization2k
PaperPDFCode(official)

Abstract

Vector Quantized Variational Autoencoders (VQ-VAEs) are fundamental models that compress continuous visual data into discrete tokens. Existing methods have tried to improve the quantization strategy for better reconstruction quality, however, there still exists a large gap between VQ-VAEs and VAEs. To narrow this gap, we propose MGVQ, a novel method to augment the representation capability of discrete codebooks, facilitating easier optimization for codebooks and minimizing information loss, thereby enhancing reconstruction quality. Specifically, we propose to retain the latent dimension to preserve encoded features and incorporate a set of sub-codebooks for quantization. Furthermore, we construct comprehensive zero-shot benchmarks featuring resolutions of 512p and 2k to evaluate the reconstruction performance of existing methods rigorously. MGVQ achieves the state-of-the-art performance on both ImageNet and 8 zero-shot benchmarks across all VQ-VAEs. Notably, compared with SD-VAE, we outperform them on ImageNet significantly, with rFID 0.49 v.s. 0.91, and achieve superior PSNR on all zero-shot benchmarks. These results highlight the superiority of MGVQ in reconstruction and pave the way for preserving fidelity in HD image processing tasks. Code will be publicly available at https://github.com/MKJia/MGVQ.

Results

TaskDatasetMetricValueModel
Image GenerationImageNet 256x256FID3.02MGVQ
Image GenerationImageNet 256x256Inception score294.1MGVQ
Image ReconstructionUltra-High Resolution Image Reconstruction BenchmarkLPIPS0.092MGVQ (16x16x4)
Image ReconstructionUltra-High Resolution Image Reconstruction BenchmarkPSNR28.27MGVQ (16x16x4)
Image ReconstructionUltra-High Resolution Image Reconstruction BenchmarkSSIM0.844MGVQ (16x16x4)
Image ReconstructionUltra-High Resolution Image Reconstruction BenchmarkrFID1.59MGVQ (16x16x4)
Image ReconstructionImageNetFID0.49MGVQ (16x16x8)
Image ReconstructionImageNetLPIPS0.086MGVQ (16x16x8)
Image ReconstructionImageNetPSNR24.7MGVQ (16x16x8)
Image ReconstructionImageNetSSIM0.787MGVQ (16x16x8)
Image ReconstructionImageNetFID0.64MGVQ (16x16x4)
Image ReconstructionImageNetLPIPS0.11MGVQ (16x16x4)
Image ReconstructionImageNetPSNR23.71MGVQ (16x16x4)
Image ReconstructionImageNetSSIM0.755MGVQ (16x16x4)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14Lightweight Federated Learning over Wireless Edge Networks2025-07-13Vision Foundation Models as Effective Visual Tokenizers for Autoregressive Image Generation2025-07-11