TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Efficient-VDVAE: Less is more

Efficient-VDVAE: Less is more

Louay Hazami, Rayhane Mama, Ragavan Thurairatnam

2022-03-25QuantizationImage Generation
PaperPDFCode(official)

Abstract

Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .

Results

TaskDatasetMetricValueModel
Image GenerationBinarized MNISTnats79.09Efficient-VDVAE
Image GenerationCelebA 64x64bits/dimension1.83Efficient-VDVAE
Image GenerationFFHQ 256 x 256FID34.88Efficient-VDVAE
Image GenerationFFHQ 256 x 256bits/dimension0.53Efficient-VDVAE
Image GenerationFFHQ 256 x 256FD514.16Efficient-VDVAE (DINOv2)
Image GenerationFFHQ 256 x 256Precision0.86Efficient-VDVAE (DINOv2)
Image GenerationFFHQ 256 x 256Recall0.14Efficient-VDVAE (DINOv2)
Image GenerationCelebA-HQ 1024x1024bits/dimension1.01Efficient-VDVAE
Image GenerationCelebA 256x256bpd0.51Efficient-VDVAE
Image GenerationCelebA 256x256bpd (8-bits)1.35Efficient-VDVAE
Image GenerationFFHQ 1024 x 1024bits/dimension2.3Efficient-VDVAE

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17