TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/VQ-VAE

VQ-VAE

Computer VisionIntroduced 2000197 papers
Source Paper

Description

VQ-VAE is a type of variational autoencoder that uses vector quantisation to obtain a discrete latent representation. It differs from VAEs in two key ways: the encoder network outputs discrete, rather than continuous, codes; and the prior is learnt rather than static. In order to learn a discrete latent representation, ideas from vector quantisation (VQ) are incorporated. Using the VQ method allows the model to circumvent issues of posterior collapse - where the latents are ignored when they are paired with a powerful autoregressive decoder - typically observed in the VAE framework. Pairing these representations with an autoregressive prior, the model can generate high quality images, videos, and speech as well as doing high quality speaker conversion and unsupervised learning of phonemes.

Papers Using This Method

DuetGen: Music Driven Two-Person Dance Generation via Hierarchical Masked Modeling2025-06-23Policy-Based Trajectory Clustering in Offline Reinforcement Learning2025-06-10STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization2025-06-04VesselGPT: Autoregressive Modeling of Vascular Geometry2025-05-19M3G: Multi-Granular Gesture Generator for Audio-Driven Full-Body Human Motion Synthesis2025-05-13Towards Foundation Models for Experimental Readout Systems Combining Discrete and Continuous Data2025-05-13Ego4o: Egocentric Human Motion Capture and Understanding from Multi-Modal Input2025-04-11Instruction-Guided Autoregressive Neural Network Parameter Generation2025-04-02Make Some Noise: Towards LLM audio reasoning and generation using sound tokens2025-03-28AGIR: Assessing 3D Gait Impairment with Reasoning based on LLMs2025-03-23Improving Autoregressive Image Generation through Coarse-to-Fine Token Prediction2025-03-20A Foundation Model for Patient Behavior Monitoring and Suicide Detection2025-03-19GenM$^3$: Generative Pretrained Multi-path Motion Model for Text Conditional Human Motion Generation2025-03-19BioSerenity-E1: a self-supervised EEG model for medical applications2025-03-13UniGenX: Unified Generation of Sequence and Structure with Autoregressive Diffusion2025-03-09Enhancing Spoken Discourse Modeling in Language Models Using Gestural Cues2025-03-05CAPS: Context-Aware Priority Sampling for Enhanced Imitation Learning in Autonomous Driving2025-03-03Vevo: Controllable Zero-Shot Voice Imitation with Self-Supervised Disentanglement2025-02-11Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning2025-02-05Patch-aware Vector Quantized Codebook Learning for Unsupervised Visual Defect Detection2025-01-15