VQ-VAE-2

Computer VisionIntroduced 20007 papers

Description

VQ-VAE-2 is a type of variational autoencoder that combines a a two-level hierarchical VQ-VAE with a self-attention autoregressive model (PixelCNN) as a prior. The encoder and decoder architectures are kept simple and light-weight as in the original VQ-VAE, with the only difference that hierarchical multi-scale latent maps are used for increased resolution.

Papers Using This Method

HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models2025-03-14 HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes2023-12-31 SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles2023-11-20 Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data2023-05-22 Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation2022-08-09 An Unsupervised Video Game Playstyle Metric via State Discretization2021-10-03 Generating Diverse High-Fidelity Images with VQ-VAE-22019-06-02