Speech Recognition on LibriSpeech test-clean

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Word Error Rate (WER)▲	Extra Data	Paper	Date↕	Code
1	United Med ASR	0.985	Yes	High-precision medical speech recognition throug...	2024-11-24	-
2	SAMBA ASR	1.17	Yes	Samba-ASR: State-Of-The-Art Speech Recognition L...	2025-01-06	-
3	FAdam	1.34	Yes	FAdam: Adam is a natural gradient optimizer usin...	2024-05-21	Code
4	Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light	1.4	Yes	Pushing the Limits of Semi-Supervised Learning f...	2020-10-20	Code
5	w2v-BERT XXL	1.4	Yes	W2v-BERT: Combining Contrastive Learning and Mas...	2021-08-07	Code
6	parakeet-rnnt-1.1b	1.46	Yes	Fast Conformer with Linearly Scalable Attention ...	2023-05-08	-
7	Conv + Transformer + wav2vec2.0 + pseudo labeling	1.5	Yes	Self-training and Pre-training are Complementary...	2020-10-22	Code
8	ContextNet + SpecAugment-based Noisy Student Training with Libri-Light	1.7	Yes	Improved Noisy Student Training for Automatic Sp...	2020-05-19	Code
9	SpeechStew (1B)	1.7	Yes	SpeechStew: Simply Mix All Available Speech Reco...	2021-04-05	-
10	Multistream CNN with Self-Attentive SRU (WER includes text normalization)	1.75	Yes	ASAPP-ASR: Multistream CNN and Self-Attentive SR...	2020-05-21	-
11	Stateformer	1.76	No	Multi-Head State Space Model for Speech Recognit...	2023-05-21	-
12	wav2vec 2.0 with Libri-Light	1.8	Yes	wav2vec 2.0: A Framework for Self-Supervised Lea...	2020-06-20	Code
13	HuBERT with Libri-Light	1.8	Yes	HuBERT: Self-Supervised Speech Representation Le...	2021-06-14	Code
14	WavLM Large	1.8	No	WavLM: Large-Scale Self-Supervised Pre-Training ...	2021-10-26	Code
15	E-Branchformer (L) + Internal Language Model Estimation	1.81	No	E-Branchformer: Branchformer with Enhanced mergi...	2022-09-30	Code
16	Zipformer+pruned transducer w/ CR-CTC (no external language model)	1.88	No	CR-CTC: Consistency regularization on CTC for im...	2024-10-07	Code
17	ContextNet(L)	1.9	No	ContextNet: Improving Convolutional Neural Netwo...	2020-05-07	Code
18	Conformer(L)	1.9	No	Conformer: Convolution-augmented Transformer for...	2020-05-16	Code
19	Transformer+Time reduction+Self Knowledge distillation	1.9	No	Transformer-based ASR Incorporating Time-reducti...	2021-03-17	-
20	ContextNet(M)	2	Yes	ContextNet: Improving Convolutional Neural Netwo...	2020-05-07	Code
21	Transformer Transducer	2	No	Improving RNN Transducer Based ASR with Auxiliar...	2020-11-05	Code
22	Conformer(M)	2	Yes	Conformer: Convolution-augmented Transformer for...	2020-05-16	Code
23	SpeechStew (100M)	2	No	SpeechStew: Simply Mix All Available Speech Reco...	2021-04-05	-
24	Qwen-Audio	2	No	Qwen-Audio: Advancing Universal Audio Understand...	2023-11-14	Code
25	Zipformer+pruned transducer (no external language model)	2	No	Zipformer: A faster and better encoder for autom...	2023-10-17	Code
26	Zipformer+CR-CTC (no external language model)	2.02	No	CR-CTC: Consistency regularization on CTC for im...	2024-10-07	Code
27	Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)	2.03	No	End-to-end ASR: from Supervised to Semi-Supervis...	2019-11-19	Code
28	Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)	2.1	No	Iterative Pseudo-Labeling for Speech Recognition	2020-05-19	Code
29	CTC + Transformer LM rescoring	2.1	No	Faster, Simpler and More Accurate Hybrid ASR Sys...	2020-05-19	-
30	Conformer(S)	2.1	No	Conformer: Convolution-augmented Transformer for...	2020-05-16	Code
31	Branchformer + GFSA	2.11	No	Graph Convolutions Enrich the Self-Attention in ...	2023-12-07	Code
32	Multi-Stream Self-Attention With Dilated 1D Convolutions	2.2	No	State-of-the-Art Speech Recognition Using Multi-...	2019-10-01	Code
33	LSTM Transducer	2.23	Yes	Librispeech Transducer Model with Internal Langu...	2021-04-07	Code
34	Hybrid + Transformer LM rescoring	2.26	No	Transformer-based Acoustic Modeling for Hybrid S...	2019-10-22	-
35	Hybrid model with Transformer rescoring	2.3	No	RWTH ASR Systems for LibriSpeech: Hybrid vs Atte...	2019-05-08	Code
36	ContextNet(S)	2.3	Yes	ContextNet: Improving Convolutional Neural Netwo...	2020-05-07	Code
37	Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)	2.31	No	End-to-end ASR: from Supervised to Semi-Supervis...	2019-11-19	Code
38	Squeezeformer (L)	2.47	No	Squeezeformer: An Efficient Transformer for Auto...	2022-06-02	Code
39	LAS + SpecAugment	2.5	Yes	SpecAugment: A Simple Data Augmentation Method f...	2019-04-18	Code
40	Transformer	2.6	Yes	A Comparative Study on Transformer vs RNN in Spe...	2019-09-13	Code
41	QuartzNet15x5	2.69	No	-	-	Code
42	LAS (no LM)	2.7	Yes	SpecAugment: A Simple Data Augmentation Method f...	2019-04-18	Code
43	wav2vec_wav2letter	2.7	No	Self-training and Pre-training are Complementary...	2020-10-22	Code
44	Espresso	2.8	No	Espresso: A Fast End-to-end Neural Speech Recogn...	2019-09-18	Code
45	Jasper DR 10x5 (+ Time/Freq Masks)	2.84	No	Jasper: An End-to-End Convolutional Neural Acous...	2019-04-05	Code
46	Jasper DR 10x5	2.95	No	Jasper: An End-to-End Convolutional Neural Acous...	2019-04-05	Code
47	tdnn + chain + rnnlm rescoring	3.06	No	-	-	-
48	Convolutional Speech Recognition	3.26	Yes	Fully Convolutional Speech Recognition	2018-12-17	-
49	MT4SSL	3.4	No	MT4SSL: Boosting Self-Supervised Speech Represen...	2022-11-14	Code
50	Model Unit Exploration	3.6	No	On the Choice of Modeling Unit for Sequence-to-S...	2019-02-05	Code
51	Seq-to-seq attention	3.82	Yes	Improved training of end-to-end attention models...	2018-05-08	Code
52	CTC-CRF 4gram-LM	4.09	No	-	-	Code
53	HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations	4.3	No	-	-	-
54	Centaurus (30 M)	4.4	No	Let SSMs be ConvNets: State-space Modeling with ...	2025-01-22	-
55	HMM-TDNN + iVectors	4.8	Yes	-	-	-
56	Gated ConvNets	4.8	No	Letter-Based Speech Recognition with Gated ConvN...	2017-12-22	Code
57	Deep Speech 2	5.33	No	Deep Speech 2: End-to-End Speech Recognition in ...	2015-12-08	Code
58	CTC + policy learning	5.42	No	Improving End-to-End Speech Recognition with Pol...	2017-12-19	-
59	HMM-DNN + pNorm*	5.5	Yes	-	-	-
60	Li-GRU	6.2	No	The PyTorch-Kaldi Speech Recognition Toolkit	2018-11-19	Code
61	Snips	6.4	No	Snips Voice Platform: an embedded Spoken Languag...	2018-05-25	Code
62	Local Prior Matching (Large Model)	7.19	No	Semi-Supervised Speech Recognition via Local Pri...	2020-02-24	Code
63	HMM-(SAT)GMM	8	Yes	-	-	-
64	AmNet	8.6	No	Amortized Neural Networks for Low-Latency Speech...	2021-08-03	-

#1United Med ASRSOTA
0.985
Word Error Rate (WER)· Extra Data· 2024-11-24
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
#2SAMBA ASR
1.17
Word Error Rate (WER)· Extra Data· 2025-01-06
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
#3FAdamSOTA
1.34
Word Error Rate (WER)· Extra Data· 2024-05-21
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information Code
#4Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-LightSOTA
1.4
Word Error Rate (WER)· Extra Data· 2020-10-20
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition Code
#5w2v-BERT XXL
1.4
Word Error Rate (WER)· Extra Data· 2021-08-07
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training Code
#6parakeet-rnnt-1.1b
1.46
Word Error Rate (WER)· Extra Data· 2023-05-08
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
#7Conv + Transformer + wav2vec2.0 + pseudo labeling
1.5
Word Error Rate (WER)· Extra Data· 2020-10-22
Self-training and Pre-training are Complementary for Speech Recognition Code
#8ContextNet + SpecAugment-based Noisy Student Training with Libri-LightSOTA
1.7
Word Error Rate (WER)· Extra Data· 2020-05-19
Improved Noisy Student Training for Automatic Speech Recognition Code
#9SpeechStew (1B)
1.7
Word Error Rate (WER)· Extra Data· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#10Multistream CNN with Self-Attentive SRU (WER includes text normalization)
1.75
Word Error Rate (WER)· Extra Data· 2020-05-21
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
#11Stateformer
1.76
Word Error Rate (WER)· 2023-05-21
Multi-Head State Space Model for Speech Recognition
#12wav2vec 2.0 with Libri-Light
1.8
Word Error Rate (WER)· Extra Data· 2020-06-20
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Code
#13HuBERT with Libri-Light
1.8
Word Error Rate (WER)· Extra Data· 2021-06-14
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units Code
#14WavLM Large
1.8
Word Error Rate (WER)· 2021-10-26
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing Code
#15E-Branchformer (L) + Internal Language Model Estimation
1.81
Word Error Rate (WER)· 2022-09-30
E-Branchformer: Branchformer with Enhanced merging for speech recognition Code
#16Zipformer+pruned transducer w/ CR-CTC (no external language model)
1.88
Word Error Rate (WER)· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition Code
#17ContextNet(L)SOTA
1.9
Word Error Rate (WER)· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context Code
#18Conformer(L)
1.9
Word Error Rate (WER)· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition Code
#19Transformer+Time reduction+Self Knowledge distillation
1.9
Word Error Rate (WER)· 2021-03-17
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
#20ContextNet(M)SOTA
2
Word Error Rate (WER)· Extra Data· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context Code
#21Transformer Transducer
2
Word Error Rate (WER)· 2020-11-05
Improving RNN Transducer Based ASR with Auxiliary Tasks Code
#22Conformer(M)
2
Word Error Rate (WER)· Extra Data· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition Code
#23SpeechStew (100M)
2
Word Error Rate (WER)· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#24Qwen-Audio
2
Word Error Rate (WER)· 2023-11-14
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Code
#25Zipformer+pruned transducer (no external language model)
2
Word Error Rate (WER)· 2023-10-17
Zipformer: A faster and better encoder for automatic speech recognition Code
#26Zipformer+CR-CTC (no external language model)
2.02
Word Error Rate (WER)· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition Code
#27Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)SOTA
2.03
Word Error Rate (WER)· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures Code
#28Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)
2.1
Word Error Rate (WER)· 2020-05-19
Iterative Pseudo-Labeling for Speech Recognition Code
#29CTC + Transformer LM rescoring
2.1
Word Error Rate (WER)· 2020-05-19
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
#30Conformer(S)
2.1
Word Error Rate (WER)· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition Code
#31Branchformer + GFSA
2.11
Word Error Rate (WER)· 2023-12-07
Graph Convolutions Enrich the Self-Attention in Transformers!Code
#32Multi-Stream Self-Attention With Dilated 1D ConvolutionsSOTA
2.2
Word Error Rate (WER)· 2019-10-01
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions Code
#33LSTM Transducer
2.23
Word Error Rate (WER)· Extra Data· 2021-04-07
Librispeech Transducer Model with Internal Language Model Prior Correction Code
#34Hybrid + Transformer LM rescoring
2.26
Word Error Rate (WER)· 2019-10-22
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
#35Hybrid model with Transformer rescoringSOTA
2.3
Word Error Rate (WER)· 2019-05-08
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation Code
#36ContextNet(S)
2.3
Word Error Rate (WER)· Extra Data· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context Code
#37Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)
2.31
Word Error Rate (WER)· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures Code
#38Squeezeformer (L)
2.47
Word Error Rate (WER)· 2022-06-02
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition Code
#39LAS + SpecAugmentSOTA
2.5
Word Error Rate (WER)· Extra Data· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Code
#40Transformer
2.6
Word Error Rate (WER)· Extra Data· 2019-09-13
A Comparative Study on Transformer vs RNN in Speech Applications Code
#41QuartzNet15x5
2.69
Word Error Rate (WER)
No paperCode
#42LAS (no LM)SOTA
2.7
Word Error Rate (WER)· Extra Data· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition Code
#43wav2vec_wav2letter
2.7
Word Error Rate (WER)· 2020-10-22
Self-training and Pre-training are Complementary for Speech Recognition Code
#44Espresso
2.8
Word Error Rate (WER)· 2019-09-18
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit Code
#45Jasper DR 10x5 (+ Time/Freq Masks)SOTA
2.84
Word Error Rate (WER)· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model Code
#46Jasper DR 10x5SOTA
2.95
Word Error Rate (WER)· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model Code
#47tdnn + chain + rnnlm rescoring
3.06
Word Error Rate (WER)
No paper
#48Convolutional Speech RecognitionSOTA
3.26
Word Error Rate (WER)· Extra Data· 2018-12-17
Fully Convolutional Speech Recognition
#49MT4SSL
3.4
Word Error Rate (WER)· 2022-11-14
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets Code
#50Model Unit Exploration
3.6
Word Error Rate (WER)· 2019-02-05
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition Code
#51Seq-to-seq attentionSOTA
3.82
Word Error Rate (WER)· Extra Data· 2018-05-08
Improved training of end-to-end attention models for speech recognition Code
#52CTC-CRF 4gram-LM
4.09
Word Error Rate (WER)
No paperCode
#53HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations
4.3
Word Error Rate (WER)
No paper
#54Centaurus (30 M)
4.4
Word Error Rate (WER)· 2025-01-22
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
#55HMM-TDNN + iVectors
4.8
Word Error Rate (WER)· Extra Data
No paper
#56Gated ConvNetsSOTA
4.8
Word Error Rate (WER)· 2017-12-22
Letter-Based Speech Recognition with Gated ConvNets Code
#57Deep Speech 2SOTA
5.33
Word Error Rate (WER)· 2015-12-08
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Code
#58CTC + policy learning
5.42
Word Error Rate (WER)· 2017-12-19
Improving End-to-End Speech Recognition with Policy Learning
#59HMM-DNN + pNorm*
5.5
Word Error Rate (WER)· Extra Data
No paper
#60Li-GRU
6.2
Word Error Rate (WER)· 2018-11-19
The PyTorch-Kaldi Speech Recognition Toolkit Code
#61Snips
6.4
Word Error Rate (WER)· 2018-05-25
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces Code
#62Local Prior Matching (Large Model)
7.19
Word Error Rate (WER)· 2020-02-24
Semi-Supervised Speech Recognition via Local Prior Matching Code
#63HMM-(SAT)GMM
8
Word Error Rate (WER)· Extra Data
No paper
#64AmNet
8.6
Word Error Rate (WER)· 2021-08-03
Amortized Neural Networks for Low-Latency Speech Recognition