Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

2024-12-19Quantization Image Reconstruction

Abstract

Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation. In this study, we identify the local minima issue as the primary cause of this instability. To address this, we integrate an optimal transport method in place of the nearest neighbor search to achieve a more globally informed assignment. We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem, thereby enhancing the stability and efficiency of the training process. To mitigate the influence of diverse data distributions on the Sinkhorn algorithm, we implement a straightforward yet effective normalization strategy. Our comprehensive experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.

Results

Task	Dataset	Metric	Value	Model
Image Reconstruction	ImageNet	FID	0.91	OptVQ (16x16x8)
Image Reconstruction	ImageNet	LPIPS	0.066	OptVQ (16x16x8)
Image Reconstruction	ImageNet	PSNR	27.57	OptVQ (16x16x8)
Image Reconstruction	ImageNet	SSIM	0.729	OptVQ (16x16x8)
Image Reconstruction	ImageNet	FID	1	OptVQ (16x16x4)
Image Reconstruction	ImageNet	LPIPS	0.076	OptVQ (16x16x4)
Image Reconstruction	ImageNet	PSNR	26.59	OptVQ (16x16x4)
Image Reconstruction	ImageNet	SSIM	0.717	OptVQ (16x16x4)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15 The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology2025-07-15 3D Magnetic Inverse Routine for Single-Segment Magnetic Field Images2025-07-15 MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14