Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Speech Recognition
/
LibriSpeech test-clean
Speech Recognition on LibriSpeech test-clean
Metric: Word Error Rate (WER) (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Word Error Rate (WER) (best first)
Word Error Rate (WER) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Word Error Rate (WER)
▲
Extra Data
Paper
Date
↕
Code
1
United Med ASR
0.985
Yes
High-precision medical speech recognition throug...
2024-11-24
-
2
SAMBA ASR
1.17
Yes
Samba-ASR: State-Of-The-Art Speech Recognition L...
2025-01-06
-
3
FAdam
1.34
Yes
FAdam: Adam is a natural gradient optimizer usin...
2024-05-21
Code
4
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light
1.4
Yes
Pushing the Limits of Semi-Supervised Learning f...
2020-10-20
Code
5
w2v-BERT XXL
1.4
Yes
W2v-BERT: Combining Contrastive Learning and Mas...
2021-08-07
Code
6
parakeet-rnnt-1.1b
1.46
Yes
Fast Conformer with Linearly Scalable Attention ...
2023-05-08
-
7
Conv + Transformer + wav2vec2.0 + pseudo labeling
1.5
Yes
Self-training and Pre-training are Complementary...
2020-10-22
Code
8
ContextNet + SpecAugment-based Noisy Student Training with Libri-Light
1.7
Yes
Improved Noisy Student Training for Automatic Sp...
2020-05-19
Code
9
SpeechStew (1B)
1.7
Yes
SpeechStew: Simply Mix All Available Speech Reco...
2021-04-05
-
10
Multistream CNN with Self-Attentive SRU (WER includes text normalization)
1.75
Yes
ASAPP-ASR: Multistream CNN and Self-Attentive SR...
2020-05-21
-
11
Stateformer
1.76
No
Multi-Head State Space Model for Speech Recognit...
2023-05-21
-
12
wav2vec 2.0 with Libri-Light
1.8
Yes
wav2vec 2.0: A Framework for Self-Supervised Lea...
2020-06-20
Code
13
HuBERT with Libri-Light
1.8
Yes
HuBERT: Self-Supervised Speech Representation Le...
2021-06-14
Code
14
WavLM Large
1.8
No
WavLM: Large-Scale Self-Supervised Pre-Training ...
2021-10-26
Code
15
E-Branchformer (L) + Internal Language Model Estimation
1.81
No
E-Branchformer: Branchformer with Enhanced mergi...
2022-09-30
Code
16
Zipformer+pruned transducer w/ CR-CTC (no external language model)
1.88
No
CR-CTC: Consistency regularization on CTC for im...
2024-10-07
Code
17
ContextNet(L)
1.9
No
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
18
Conformer(L)
1.9
No
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
19
Transformer+Time reduction+Self Knowledge distillation
1.9
No
Transformer-based ASR Incorporating Time-reducti...
2021-03-17
-
20
ContextNet(M)
2
Yes
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
21
Transformer Transducer
2
No
Improving RNN Transducer Based ASR with Auxiliar...
2020-11-05
Code
22
Conformer(M)
2
Yes
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
23
SpeechStew (100M)
2
No
SpeechStew: Simply Mix All Available Speech Reco...
2021-04-05
-
24
Qwen-Audio
2
No
Qwen-Audio: Advancing Universal Audio Understand...
2023-11-14
Code
25
Zipformer+pruned transducer (no external language model)
2
No
Zipformer: A faster and better encoder for autom...
2023-10-17
Code
26
Zipformer+CR-CTC (no external language model)
2.02
No
CR-CTC: Consistency regularization on CTC for im...
2024-10-07
Code
27
Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)
2.03
No
End-to-end ASR: from Supervised to Semi-Supervis...
2019-11-19
Code
28
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)
2.1
No
Iterative Pseudo-Labeling for Speech Recognition
2020-05-19
Code
29
CTC + Transformer LM rescoring
2.1
No
Faster, Simpler and More Accurate Hybrid ASR Sys...
2020-05-19
-
30
Conformer(S)
2.1
No
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
31
Branchformer + GFSA
2.11
No
Graph Convolutions Enrich the Self-Attention in ...
2023-12-07
Code
32
Multi-Stream Self-Attention With Dilated 1D Convolutions
2.2
No
State-of-the-Art Speech Recognition Using Multi-...
2019-10-01
Code
33
LSTM Transducer
2.23
Yes
Librispeech Transducer Model with Internal Langu...
2021-04-07
Code
34
Hybrid + Transformer LM rescoring
2.26
No
Transformer-based Acoustic Modeling for Hybrid S...
2019-10-22
-
35
Hybrid model with Transformer rescoring
2.3
No
RWTH ASR Systems for LibriSpeech: Hybrid vs Atte...
2019-05-08
Code
36
ContextNet(S)
2.3
Yes
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
37
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)
2.31
No
End-to-end ASR: from Supervised to Semi-Supervis...
2019-11-19
Code
38
Squeezeformer (L)
2.47
No
Squeezeformer: An Efficient Transformer for Auto...
2022-06-02
Code
39
LAS + SpecAugment
2.5
Yes
SpecAugment: A Simple Data Augmentation Method f...
2019-04-18
Code
40
Transformer
2.6
Yes
A Comparative Study on Transformer vs RNN in Spe...
2019-09-13
Code
41
QuartzNet15x5
2.69
No
-
-
Code
42
LAS (no LM)
2.7
Yes
SpecAugment: A Simple Data Augmentation Method f...
2019-04-18
Code
43
wav2vec_wav2letter
2.7
No
Self-training and Pre-training are Complementary...
2020-10-22
Code
44
Espresso
2.8
No
Espresso: A Fast End-to-end Neural Speech Recogn...
2019-09-18
Code
45
Jasper DR 10x5 (+ Time/Freq Masks)
2.84
No
Jasper: An End-to-End Convolutional Neural Acous...
2019-04-05
Code
46
Jasper DR 10x5
2.95
No
Jasper: An End-to-End Convolutional Neural Acous...
2019-04-05
Code
47
tdnn + chain + rnnlm rescoring
3.06
No
-
-
-
48
Convolutional Speech Recognition
3.26
Yes
Fully Convolutional Speech Recognition
2018-12-17
-
49
MT4SSL
3.4
No
MT4SSL: Boosting Self-Supervised Speech Represen...
2022-11-14
Code
50
Model Unit Exploration
3.6
No
On the Choice of Modeling Unit for Sequence-to-S...
2019-02-05
Code
51
Seq-to-seq attention
3.82
Yes
Improved training of end-to-end attention models...
2018-05-08
Code
52
CTC-CRF 4gram-LM
4.09
No
-
-
Code
53
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations
4.3
No
-
-
-
54
Centaurus (30 M)
4.4
No
Let SSMs be ConvNets: State-space Modeling with ...
2025-01-22
-
55
HMM-TDNN + iVectors
4.8
Yes
-
-
-
56
Gated ConvNets
4.8
No
Letter-Based Speech Recognition with Gated ConvN...
2017-12-22
Code
57
Deep Speech 2
5.33
No
Deep Speech 2: End-to-End Speech Recognition in ...
2015-12-08
Code
58
CTC + policy learning
5.42
No
Improving End-to-End Speech Recognition with Pol...
2017-12-19
-
59
HMM-DNN + pNorm*
5.5
Yes
-
-
-
60
Li-GRU
6.2
No
The PyTorch-Kaldi Speech Recognition Toolkit
2018-11-19
Code
61
Snips
6.4
No
Snips Voice Platform: an embedded Spoken Languag...
2018-05-25
Code
62
Local Prior Matching (Large Model)
7.19
No
Semi-Supervised Speech Recognition via Local Pri...
2020-02-24
Code
63
HMM-(SAT)GMM
8
Yes
-
-
-
64
AmNet
8.6
No
Amortized Neural Networks for Low-Latency Speech...
2021-08-03
-
#1
United Med ASR
SOTA
0.985
Word Error Rate (WER)
· Extra Data
· 2024-11-24
High-precision medical speech recognition through synthetic data and semantic correction: UNITED-MEDASR
#2
SAMBA ASR
1.17
Word Error Rate (WER)
· Extra Data
· 2025-01-06
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
#3
FAdam
SOTA
1.34
Word Error Rate (WER)
· Extra Data
· 2024-05-21
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Code
#4
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light
SOTA
1.4
Word Error Rate (WER)
· Extra Data
· 2020-10-20
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Code
#5
w2v-BERT XXL
1.4
Word Error Rate (WER)
· Extra Data
· 2021-08-07
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Code
#6
parakeet-rnnt-1.1b
1.46
Word Error Rate (WER)
· Extra Data
· 2023-05-08
Fast Conformer with Linearly Scalable Attention for Efficient Speech Recognition
#7
Conv + Transformer + wav2vec2.0 + pseudo labeling
1.5
Word Error Rate (WER)
· Extra Data
· 2020-10-22
Self-training and Pre-training are Complementary for Speech Recognition
Code
#8
ContextNet + SpecAugment-based Noisy Student Training with Libri-Light
SOTA
1.7
Word Error Rate (WER)
· Extra Data
· 2020-05-19
Improved Noisy Student Training for Automatic Speech Recognition
Code
#9
SpeechStew (1B)
1.7
Word Error Rate (WER)
· Extra Data
· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#10
Multistream CNN with Self-Attentive SRU (WER includes text normalization)
1.75
Word Error Rate (WER)
· Extra Data
· 2020-05-21
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
#11
Stateformer
1.76
Word Error Rate (WER)
· 2023-05-21
Multi-Head State Space Model for Speech Recognition
#12
wav2vec 2.0 with Libri-Light
1.8
Word Error Rate (WER)
· Extra Data
· 2020-06-20
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Code
#13
HuBERT with Libri-Light
1.8
Word Error Rate (WER)
· Extra Data
· 2021-06-14
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Code
#14
WavLM Large
1.8
Word Error Rate (WER)
· 2021-10-26
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Code
#15
E-Branchformer (L) + Internal Language Model Estimation
1.81
Word Error Rate (WER)
· 2022-09-30
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Code
#16
Zipformer+pruned transducer w/ CR-CTC (no external language model)
1.88
Word Error Rate (WER)
· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition
Code
#17
ContextNet(L)
SOTA
1.9
Word Error Rate (WER)
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#18
Conformer(L)
1.9
Word Error Rate (WER)
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#19
Transformer+Time reduction+Self Knowledge distillation
1.9
Word Error Rate (WER)
· 2021-03-17
Transformer-based ASR Incorporating Time-reduction Layer and Fine-tuning with Self-Knowledge Distillation
#20
ContextNet(M)
SOTA
2
Word Error Rate (WER)
· Extra Data
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#21
Transformer Transducer
2
Word Error Rate (WER)
· 2020-11-05
Improving RNN Transducer Based ASR with Auxiliary Tasks
Code
#22
Conformer(M)
2
Word Error Rate (WER)
· Extra Data
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#23
SpeechStew (100M)
2
Word Error Rate (WER)
· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#24
Qwen-Audio
2
Word Error Rate (WER)
· 2023-11-14
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Code
#25
Zipformer+pruned transducer (no external language model)
2
Word Error Rate (WER)
· 2023-10-17
Zipformer: A faster and better encoder for automatic speech recognition
Code
#26
Zipformer+CR-CTC (no external language model)
2.02
Word Error Rate (WER)
· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition
Code
#27
Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring)
SOTA
2.03
Word Error Rate (WER)
· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Code
#28
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)
2.1
Word Error Rate (WER)
· 2020-05-19
Iterative Pseudo-Labeling for Speech Recognition
Code
#29
CTC + Transformer LM rescoring
2.1
Word Error Rate (WER)
· 2020-05-19
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
#30
Conformer(S)
2.1
Word Error Rate (WER)
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#31
Branchformer + GFSA
2.11
Word Error Rate (WER)
· 2023-12-07
Graph Convolutions Enrich the Self-Attention in Transformers!
Code
#32
Multi-Stream Self-Attention With Dilated 1D Convolutions
SOTA
2.2
Word Error Rate (WER)
· 2019-10-01
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Code
#33
LSTM Transducer
2.23
Word Error Rate (WER)
· Extra Data
· 2021-04-07
Librispeech Transducer Model with Internal Language Model Prior Correction
Code
#34
Hybrid + Transformer LM rescoring
2.26
Word Error Rate (WER)
· 2019-10-22
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
#35
Hybrid model with Transformer rescoring
SOTA
2.3
Word Error Rate (WER)
· 2019-05-08
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Code
#36
ContextNet(S)
2.3
Word Error Rate (WER)
· Extra Data
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#37
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)
2.31
Word Error Rate (WER)
· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Code
#38
Squeezeformer (L)
2.47
Word Error Rate (WER)
· 2022-06-02
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Code
#39
LAS + SpecAugment
SOTA
2.5
Word Error Rate (WER)
· Extra Data
· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Code
#40
Transformer
2.6
Word Error Rate (WER)
· Extra Data
· 2019-09-13
A Comparative Study on Transformer vs RNN in Speech Applications
Code
#41
QuartzNet15x5
2.69
Word Error Rate (WER)
No paper
Code
#42
LAS (no LM)
SOTA
2.7
Word Error Rate (WER)
· Extra Data
· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Code
#43
wav2vec_wav2letter
2.7
Word Error Rate (WER)
· 2020-10-22
Self-training and Pre-training are Complementary for Speech Recognition
Code
#44
Espresso
2.8
Word Error Rate (WER)
· 2019-09-18
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Code
#45
Jasper DR 10x5 (+ Time/Freq Masks)
SOTA
2.84
Word Error Rate (WER)
· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model
Code
#46
Jasper DR 10x5
SOTA
2.95
Word Error Rate (WER)
· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model
Code
#47
tdnn + chain + rnnlm rescoring
3.06
Word Error Rate (WER)
No paper
#48
Convolutional Speech Recognition
SOTA
3.26
Word Error Rate (WER)
· Extra Data
· 2018-12-17
Fully Convolutional Speech Recognition
#49
MT4SSL
3.4
Word Error Rate (WER)
· 2022-11-14
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Code
#50
Model Unit Exploration
3.6
Word Error Rate (WER)
· 2019-02-05
On the Choice of Modeling Unit for Sequence-to-Sequence Speech Recognition
Code
#51
Seq-to-seq attention
SOTA
3.82
Word Error Rate (WER)
· Extra Data
· 2018-05-08
Improved training of end-to-end attention models for speech recognition
Code
#52
CTC-CRF 4gram-LM
4.09
Word Error Rate (WER)
No paper
Code
#53
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations
4.3
Word Error Rate (WER)
No paper
#54
Centaurus (30 M)
4.4
Word Error Rate (WER)
· 2025-01-22
Let SSMs be ConvNets: State-space Modeling with Optimal Tensor Contractions
#55
HMM-TDNN + iVectors
4.8
Word Error Rate (WER)
· Extra Data
No paper
#56
Gated ConvNets
SOTA
4.8
Word Error Rate (WER)
· 2017-12-22
Letter-Based Speech Recognition with Gated ConvNets
Code
#57
Deep Speech 2
SOTA
5.33
Word Error Rate (WER)
· 2015-12-08
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Code
#58
CTC + policy learning
5.42
Word Error Rate (WER)
· 2017-12-19
Improving End-to-End Speech Recognition with Policy Learning
#59
HMM-DNN + pNorm*
5.5
Word Error Rate (WER)
· Extra Data
No paper
#60
Li-GRU
6.2
Word Error Rate (WER)
· 2018-11-19
The PyTorch-Kaldi Speech Recognition Toolkit
Code
#61
Snips
6.4
Word Error Rate (WER)
· 2018-05-25
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
Code
#62
Local Prior Matching (Large Model)
7.19
Word Error Rate (WER)
· 2020-02-24
Semi-Supervised Speech Recognition via Local Prior Matching
Code
#63
HMM-(SAT)GMM
8
Word Error Rate (WER)
· Extra Data
No paper
#64
AmNet
8.6
Word Error Rate (WER)
· 2021-08-03
Amortized Neural Networks for Low-Latency Speech Recognition