Description
VQ-VAE-2 is a type of variational autoencoder that combines a a two-level hierarchical VQ-VAE with a self-attention autoregressive model (PixelCNN) as a prior. The encoder and decoder architectures are kept simple and light-weight as in the original VQ-VAE, with the only difference that hierarchical multi-scale latent maps are used for increased resolution.
Papers Using This Method
HiTVideo: Hierarchical Tokenizers for Enhancing Text-to-Video Generation with Autoregressive Large Language Models2025-03-14HQ-VAE: Hierarchical Discrete Representation Learning with Variational Bayes2023-12-31SeaDSC: A video-based unsupervised method for dynamic scene change detection in unmanned surface vehicles2023-11-20Phased Data Augmentation for Training a Likelihood-Based Generative Model with Limited Data2023-05-22Hierarchical Residual Learning Based Vector Quantized Variational Autoencoder for Image Reconstruction and Generation2022-08-09An Unsupervised Video Game Playstyle Metric via State Discretization2021-10-03Generating Diverse High-Fidelity Images with VQ-VAE-22019-06-02