TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/vq-wav2vec: Self-Supervised Learning of Discrete Speech Re...

vq-wav2vec: Self-Supervised Learning of Discrete Speech Representations

Alexei Baevski, Steffen Schneider, Michael Auli

2019-10-12ICLR 2020 1Speech Recognitionspeech-recognitionSelf-Supervised LearningClusteringGeneral Classification
PaperPDFCodeCodeCode

Abstract

We propose vq-wav2vec to learn discrete representations of audio segments through a wav2vec-style self-supervised context prediction task. The algorithm uses either a gumbel softmax or online k-means clustering to quantize the dense representations. Discretization enables the direct application of algorithms from the NLP community which require discrete inputs. Experiments show that BERT pre-training achieves a new state of the art on TIMIT phoneme classification and WSJ speech recognition.

Results

TaskDatasetMetricValueModel
Speech RecognitionTIMITPercentage error11.6vq-wav2vec

Related Papers

Tri-Learn Graph Fusion Network for Attributed Graph Clustering2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Ranking Vectors Clustering: Theory and Applications2025-07-16WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11