TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Audio/Speech Recognition/LibriSpeech test-clean

Speech Recognition on LibriSpeech test-clean

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Word Error Rate (WER)▲Extra DataPaperDate↕Code
1United Med ASR0.985YesHigh-precision medical speech recognition throug...2024-11-24-
2SAMBA ASR1.17YesSamba-ASR: State-Of-The-Art Speech Recognition L...2025-01-06-
3FAdam1.34YesFAdam: Adam is a natural gradient optimizer usin...2024-05-21Code
4Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light1.4YesPushing the Limits of Semi-Supervised Learning f...2020-10-20Code
5w2v-BERT XXL1.4YesW2v-BERT: Combining Contrastive Learning and Mas...2021-08-07Code
6parakeet-rnnt-1.1b1.46YesFast Conformer with Linearly Scalable Attention ...2023-05-08-
7Conv + Transformer + wav2vec2.0 + pseudo labeling1.5YesSelf-training and Pre-training are Complementary...2020-10-22Code
8ContextNet + SpecAugment-based Noisy Student Training with Libri-Light1.7YesImproved Noisy Student Training for Automatic Sp...2020-05-19Code
9SpeechStew (1B)1.7YesSpeechStew: Simply Mix All Available Speech Reco...2021-04-05-
10Multistream CNN with Self-Attentive SRU (WER includes text normalization)1.75YesASAPP-ASR: Multistream CNN and Self-Attentive SR...2020-05-21-
11Stateformer1.76NoMulti-Head State Space Model for Speech Recognit...2023-05-21-
12wav2vec 2.0 with Libri-Light1.8Yeswav2vec 2.0: A Framework for Self-Supervised Lea...2020-06-20Code
13HuBERT with Libri-Light1.8YesHuBERT: Self-Supervised Speech Representation Le...2021-06-14Code
14WavLM Large1.8NoWavLM: Large-Scale Self-Supervised Pre-Training ...2021-10-26Code
15E-Branchformer (L) + Internal Language Model Estimation1.81NoE-Branchformer: Branchformer with Enhanced mergi...2022-09-30Code
16Zipformer+pruned transducer w/ CR-CTC (no external language model)1.88NoCR-CTC: Consistency regularization on CTC for im...2024-10-07Code
17ContextNet(L)1.9NoContextNet: Improving Convolutional Neural Netwo...2020-05-07Code
18Conformer(L)1.9NoConformer: Convolution-augmented Transformer for...2020-05-16Code
19Transformer+Time reduction+Self Knowledge distillation1.9NoTransformer-based ASR Incorporating Time-reducti...2021-03-17-
20ContextNet(M)2YesContextNet: Improving Convolutional Neural Netwo...2020-05-07Code
21Transformer Transducer2NoImproving RNN Transducer Based ASR with Auxiliar...2020-11-05Code
22Conformer(M)2YesConformer: Convolution-augmented Transformer for...2020-05-16Code
23SpeechStew (100M)2NoSpeechStew: Simply Mix All Available Speech Reco...2021-04-05-
24Qwen-Audio2NoQwen-Audio: Advancing Universal Audio Understand...2023-11-14Code
25Zipformer+pruned transducer (no external language model)2NoZipformer: A faster and better encoder for autom...2023-10-17Code
26Zipformer+CR-CTC (no external language model)2.02NoCR-CTC: Consistency regularization on CTC for im...2024-10-07Code
27Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)2.03NoEnd-to-end ASR: from Supervised to Semi-Supervis...2019-11-19Code
28Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)2.1NoIterative Pseudo-Labeling for Speech Recognition2020-05-19Code
29CTC + Transformer LM rescoring2.1NoFaster, Simpler and More Accurate Hybrid ASR Sys...2020-05-19-
30Conformer(S)2.1NoConformer: Convolution-augmented Transformer for...2020-05-16Code
31Branchformer + GFSA2.11NoGraph Convolutions Enrich the Self-Attention in ...2023-12-07Code
32Multi-Stream Self-Attention With Dilated 1D Convolutions2.2NoState-of-the-Art Speech Recognition Using Multi-...2019-10-01Code
33LSTM Transducer2.23YesLibrispeech Transducer Model with Internal Langu...2021-04-07Code
34Hybrid + Transformer LM rescoring2.26NoTransformer-based Acoustic Modeling for Hybrid S...2019-10-22-
35Hybrid model with Transformer rescoring2.3NoRWTH ASR Systems for LibriSpeech: Hybrid vs Atte...2019-05-08Code
36ContextNet(S)2.3YesContextNet: Improving Convolutional Neural Netwo...2020-05-07Code
37Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)2.31NoEnd-to-end ASR: from Supervised to Semi-Supervis...2019-11-19Code
38Squeezeformer (L)2.47NoSqueezeformer: An Efficient Transformer for Auto...2022-06-02Code
39LAS + SpecAugment2.5YesSpecAugment: A Simple Data Augmentation Method f...2019-04-18Code
40Transformer2.6YesA Comparative Study on Transformer vs RNN in Spe...2019-09-13Code
41QuartzNet15x52.69No--Code
42LAS (no LM)2.7YesSpecAugment: A Simple Data Augmentation Method f...2019-04-18Code
43wav2vec_wav2letter2.7NoSelf-training and Pre-training are Complementary...2020-10-22Code
44Espresso2.8NoEspresso: A Fast End-to-end Neural Speech Recogn...2019-09-18Code
45Jasper DR 10x5 (+ Time/Freq Masks)2.84NoJasper: An End-to-End Convolutional Neural Acous...2019-04-05Code
46Jasper DR 10x52.95NoJasper: An End-to-End Convolutional Neural Acous...2019-04-05Code
47tdnn + chain + rnnlm rescoring3.06No---
48Convolutional Speech Recognition3.26YesFully Convolutional Speech Recognition2018-12-17-
49MT4SSL3.4NoMT4SSL: Boosting Self-Supervised Speech Represen...2022-11-14Code
50Model Unit Exploration3.6NoOn the Choice of Modeling Unit for Sequence-to-S...2019-02-05Code
51Seq-to-seq attention3.82YesImproved training of end-to-end attention models...2018-05-08Code
52CTC-CRF 4gram-LM4.09No--Code
53HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations4.3No---
54Centaurus (30 M)4.4NoLet SSMs be ConvNets: State-space Modeling with ...2025-01-22-
55HMM-TDNN + iVectors4.8Yes---
56Gated ConvNets4.8NoLetter-Based Speech Recognition with Gated ConvN...2017-12-22Code
57Deep Speech 25.33NoDeep Speech 2: End-to-End Speech Recognition in ...2015-12-08Code
58CTC + policy learning5.42NoImproving End-to-End Speech Recognition with Pol...2017-12-19-
59HMM-DNN + pNorm*5.5Yes---
60Li-GRU6.2NoThe PyTorch-Kaldi Speech Recognition Toolkit2018-11-19Code
61Snips6.4NoSnips Voice Platform: an embedded Spoken Languag...2018-05-25Code
62Local Prior Matching (Large Model)7.19NoSemi-Supervised Speech Recognition via Local Pri...2020-02-24Code
63HMM-(SAT)GMM8Yes---
64AmNet8.6NoAmortized Neural Networks for Low-Latency Speech...2021-08-03-