Efficient-VDVAE: Less is more

Louay Hazami, Rayhane Mama, Ragavan Thurairatnam

2022-03-25Quantization Image Generation

Abstract

Hierarchical VAEs have emerged in recent years as a reliable option for maximum likelihood estimation. However, instability issues and demanding computational requirements have hindered research progress in the area. We present simple modifications to the Very Deep VAE to make it converge up to $2.6\times$ faster, save up to $20\times$ in memory load and improve stability during training. Despite these changes, our models achieve comparable or better negative log-likelihood performance than current state-of-the-art models on all $7$ commonly used image datasets we evaluated on. We also make an argument against using 5-bit benchmarks as a way to measure hierarchical VAE's performance due to undesirable biases caused by the 5-bit quantization. Additionally, we empirically demonstrate that roughly $3\%$ of the hierarchical VAE's latent space dimensions is sufficient to encode most of the image information, without loss of performance, opening up the doors to efficiently leverage the hierarchical VAEs' latent space in downstream tasks. We release our source code and models at https://github.com/Rayhane-mamah/Efficient-VDVAE .

Results

Task	Dataset	Metric	Value	Model
Image Generation	Binarized MNIST	nats	79.09	Efficient-VDVAE
Image Generation	CelebA 64x64	bits/dimension	1.83	Efficient-VDVAE
Image Generation	FFHQ 256 x 256	FID	34.88	Efficient-VDVAE
Image Generation	FFHQ 256 x 256	bits/dimension	0.53	Efficient-VDVAE
Image Generation	FFHQ 256 x 256	FD	514.16	Efficient-VDVAE (DINOv2)
Image Generation	FFHQ 256 x 256	Precision	0.86	Efficient-VDVAE (DINOv2)
Image Generation	FFHQ 256 x 256	Recall	0.14	Efficient-VDVAE (DINOv2)
Image Generation	CelebA-HQ 1024x1024	bits/dimension	1.01	Efficient-VDVAE
Image Generation	CelebA 256x256	bpd	0.51	Efficient-VDVAE
Image Generation	CelebA 256x256	bpd (8-bits)	1.35	Efficient-VDVAE
Image Generation	FFHQ 1024 x 1024	bits/dimension	2.3	Efficient-VDVAE

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04 An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18 Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17 fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17 Synthesizing Reality: Leveraging the Generative AI-Powered Platform Midjourney for Construction Worker Detection2025-07-17 FashionPose: Text to Pose to Relight Image Generation for Personalized Fashion Visualization2025-07-17 A Distributed Generative AI Approach for Heterogeneous Multi-Domain Environments under Data Sharing constraints2025-07-17