TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Libri-Light: A Benchmark for ASR with Limited or No Superv...

Libri-Light: A Benchmark for ASR with Limited or No Supervision

Jacob Kahn, Morgane Rivière, Weiyi Zheng, Evgeny Kharitonov, Qiantong Xu, Pierre-Emmanuel Mazaré, Julien Karadayi, Vitaliy Liptchinsky, Ronan Collobert, Christian Fuegen, Tatiana Likhomanenko, Gabriel Synnaeve, Armand Joulin, Abdel-rahman Mohamed, Emmanuel Dupoux

2019-12-17Speech Recognitionspeech-recognition
PaperPDFCode(official)Code

Abstract

We introduce a new collection of spoken English audio suitable for training speech recognition systems under limited or no supervision. It is derived from open-source audio books from the LibriVox project. It contains over 60K hours of audio, which is, to our knowledge, the largest freely-available corpus of speech. The audio has been segmented using voice activity detection and is tagged with SNR, speaker ID and genre descriptions. Additionally, we provide baseline systems and evaluation metrics working under three settings: (1) the zero resource/unsupervised setting (ABX), (2) the semi-supervised setting (PER, CER) and (3) the distant supervision setting (WER). Settings (2) and (3) use limited textual resources (10 minutes to 10 hours) aligned with the speech. Setting (3) uses large amounts of unaligned text. They are evaluated on the standard LibriSpeech dev and test sets for comparison with the supervised state-of-the-art.

Results

TaskDatasetMetricValueModel
Speech RecognitionLibri-Light test-otherWord Error Rate (WER)56.6TDS 60k pseudo-label + CTC fine-tuning + 4gram-LM
Speech RecognitionLibri-Light test-otherWord Error Rate (WER)69.5CPC unlab-60k+train-10h CPC pretrain + CTC fine-tuning + 4gram-LM
Speech RecognitionLibri-Light test-otherABX-across13.42CPC unlab-60k
Speech RecognitionLibri-Light test-otherABX-within8.14CPC unlab-60k
Speech RecognitionLibri-Light test-cleanWord Error Rate (WER)29.3TDS 60k pseudo-label + CTC fine-tuning + 4gram-LM
Speech RecognitionLibri-Light test-cleanWord Error Rate (WER)43.9CPC unlab-60k+train-10h CPC pretrain + CTC fine-tuning + 4gram-LM
Speech RecognitionLibri-Light test-cleanABX-across7.56CPC unlab-60k
Speech RecognitionLibri-Light test-cleanABX-within5.83CPC unlab-60k

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06First Steps Towards Voice Anonymization for Code-Switching Speech2025-07-02MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01AUTOMATIC PRONUNCIATION MISTAKE DETECTOR PROJECT REPORT2025-06-25