TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Preventing Local Pitfalls in Vector Quantization via Optim...

Preventing Local Pitfalls in Vector Quantization via Optimal Transport

Borui Zhang, Wenzhao Zheng, Jie zhou, Jiwen Lu

2024-12-19QuantizationImage Reconstruction
PaperPDFCode(official)

Abstract

Vector-quantized networks (VQNs) have exhibited remarkable performance across various tasks, yet they are prone to training instability, which complicates the training process due to the necessity for techniques such as subtle initialization and model distillation. In this study, we identify the local minima issue as the primary cause of this instability. To address this, we integrate an optimal transport method in place of the nearest neighbor search to achieve a more globally informed assignment. We introduce OptVQ, a novel vector quantization method that employs the Sinkhorn algorithm to optimize the optimal transport problem, thereby enhancing the stability and efficiency of the training process. To mitigate the influence of diverse data distributions on the Sinkhorn algorithm, we implement a straightforward yet effective normalization strategy. Our comprehensive experiments on image reconstruction tasks demonstrate that OptVQ achieves 100% codebook utilization and surpasses current state-of-the-art VQNs in reconstruction quality.

Results

TaskDatasetMetricValueModel
Image ReconstructionImageNetFID0.91OptVQ (16x16x8)
Image ReconstructionImageNetLPIPS0.066OptVQ (16x16x8)
Image ReconstructionImageNetPSNR27.57OptVQ (16x16x8)
Image ReconstructionImageNetSSIM0.729OptVQ (16x16x8)
Image ReconstructionImageNetFID1OptVQ (16x16x4)
Image ReconstructionImageNetLPIPS0.076OptVQ (16x16x4)
Image ReconstructionImageNetPSNR26.59OptVQ (16x16x4)
Image ReconstructionImageNetSSIM0.717OptVQ (16x16x4)

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15The model is the message: Lightweight convolutional autoencoders applied to noisy imaging data for planetary science and astrobiology2025-07-153D Magnetic Inverse Routine for Single-Segment Magnetic Field Images2025-07-15MGVQ: Could VQ-VAE Beat VAE? A Generalizable Tokenizer with Multi-group Quantization2025-07-14