Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Speech Recognition
/
LibriSpeech test-other
Speech Recognition on LibriSpeech test-other
Metric: Word Error Rate (WER) (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Word Error Rate (WER) (best first)
Word Error Rate (WER) (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Word Error Rate (WER)
▲
Extra Data
Paper
Date
↕
Code
1
SAMBA ASR
2.48
No
Samba-ASR: State-Of-The-Art Speech Recognition L...
2025-01-06
-
2
FAdam
2.49
No
FAdam: Adam is a natural gradient optimizer usin...
2024-05-21
Code
3
w2v-BERT XXL
2.5
No
W2v-BERT: Combining Contrastive Learning and Mas...
2021-08-07
Code
4
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light
2.6
No
Pushing the Limits of Semi-Supervised Learning f...
2020-10-20
Code
5
HuBERT with Libri-Light
2.9
No
HuBERT: Self-Supervised Speech Representation Le...
2021-06-14
Code
6
wav2vec 2.0 with Libri-Light
3
No
wav2vec 2.0: A Framework for Self-Supervised Lea...
2020-06-20
Code
7
Conv + Transformer + wav2vec2.0 + pseudo labeling
3.1
No
Self-training and Pre-training are Complementary...
2020-10-22
Code
8
WavLM Large
3.2
No
WavLM: Large-Scale Self-Supervised Pre-Training ...
2021-10-26
Code
9
SpeechStew (1B)
3.3
No
SpeechStew: Simply Mix All Available Speech Reco...
2021-04-05
-
10
ContextNet + SpecAugment-based Noisy Student Training with Libri-Light
3.4
No
Improved Noisy Student Training for Automatic Sp...
2020-05-19
Code
11
E-Branchformer (L) + Internal Language Model Estimation
3.65
No
E-Branchformer: Branchformer with Enhanced mergi...
2022-09-30
Code
12
data2vec
3.7
No
data2vec: A General Framework for Self-supervise...
2022-02-07
Code
13
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)
3.83
No
Iterative Pseudo-Labeling for Speech Recognition
2020-05-19
Code
14
Conformer(L)
3.9
Yes
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
15
Zipformer+pruned transducer w/ CR-CTC (no external language model)
3.95
No
CR-CTC: Consistency regularization on CTC for im...
2024-10-07
Code
16
SpeechStew (100M)
4
No
SpeechStew: Simply Mix All Available Speech Reco...
2021-04-05
-
17
wav2vec 2.0
4.1
Yes
wav2vec 2.0: A Framework for Self-Supervised Lea...
2020-06-20
Code
18
ContextNet(L)
4.1
No
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
19
Conv + Transformer AM (ConvLM with Transformer Rescoring)
4.11
Yes
End-to-end ASR: from Supervised to Semi-Supervis...
2019-11-19
Code
20
CTC + Transformer LM rescoring
4.2
Yes
Faster, Simpler and More Accurate Hybrid ASR Sys...
2020-05-19
-
21
Transformer Transducer
4.2
Yes
Improving RNN Transducer Based ASR with Auxiliar...
2020-11-05
Code
22
Qwen-Audio
4.2
No
Qwen-Audio: Advancing Universal Audio Understand...
2023-11-14
Code
23
Conformer(M)
4.3
Yes
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
24
Zipformer+CR-CTC (no external language model)
4.35
No
CR-CTC: Consistency regularization on CTC for im...
2024-10-07
Code
25
Zipformer+pruned transducer (no external language model)
4.38
No
Zipformer: A faster and better encoder for autom...
2023-10-17
Code
26
Multistream CNN with Self-Attentive SRU
4.46
No
ASAPP-ASR: Multistream CNN and Self-Attentive SR...
2020-05-21
-
27
ContextNet(M)
4.5
Yes
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
28
hybrid + Transformer LM rescoring
4.85
Yes
Transformer-based Acoustic Modeling for Hybrid S...
2019-10-22
-
29
Branchformer + GFSA
4.94
No
Graph Convolutions Enrich the Self-Attention in ...
2023-12-07
Code
30
Hybrid model with Transformer rescoring
5
No
RWTH ASR Systems for LibriSpeech: Hybrid vs Atte...
2019-05-08
Code
31
Conformer(S)
5
Yes
Conformer: Convolution-augmented Transformer for...
2020-05-16
Code
32
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)
5.18
No
End-to-end ASR: from Supervised to Semi-Supervis...
2019-11-19
Code
33
ContextNet(S)
5.5
Yes
ContextNet: Improving Convolutional Neural Netwo...
2020-05-07
Code
34
LSTM Transducer
5.6
Yes
Librispeech Transducer Model with Internal Langu...
2021-04-07
Code
35
Transformer
5.7
Yes
A Comparative Study on Transformer vs RNN in Spe...
2019-09-13
Code
36
LAS + SpecAugment
5.8
Yes
SpecAugment: A Simple Data Augmentation Method f...
2019-04-18
Code
37
Multi-Stream Self-Attention With Dilated 1D Convolutions
5.8
No
State-of-the-Art Speech Recognition Using Multi-...
2019-10-01
Code
38
Squeezeformer (L)
5.97
No
Squeezeformer: An Efficient Transformer for Auto...
2022-06-02
Code
39
LAS (no LM)
6.5
Yes
SpecAugment: A Simple Data Augmentation Method f...
2019-04-18
Code
40
Conformer with Relaxed Attention
6.85
No
Relaxed Attention: A Simple Method to Boost Perf...
2021-07-02
Code
41
QuartzNet15x5
7.25
No
-
-
Code
42
tdnn + chain + rnnlm rescoring
7.63
Yes
-
-
-
43
Jasper DR 10x5 (+ Time/Freq Masks)
7.84
No
Jasper: An End-to-End Convolutional Neural Acous...
2019-04-05
Code
44
Espresso
8.7
No
Espresso: A Fast End-to-end Neural Speech Recogn...
2019-09-18
Code
45
Jasper DR 10x5
8.79
No
Jasper: An End-to-End Convolutional Neural Acous...
2019-04-05
Code
46
MT4SSL
9.6
No
MT4SSL: Boosting Self-Supervised Speech Represen...
2022-11-14
Code
47
Convolutional Speech Recognition
10.47
Yes
Fully Convolutional Speech Recognition
2018-12-17
-
48
CTC-CRF 4gram-LM
10.65
No
-
-
Code
49
TDNN + pNorm + speed up/down speech
12.5
No
-
-
-
50
Deep Speech 2
13.25
No
Deep Speech 2: End-to-End Speech Recognition in ...
2015-12-08
Code
51
Local Prior Matching (Large Model, ConvLM LM)
15.28
No
Semi-Supervised Speech Recognition via Local Pri...
2020-02-24
Code
52
Snips
16.5
No
Snips Voice Platform: an embedded Spoken Languag...
2018-05-25
Code
53
Local Prior Matching (Large Model)
20.84
Yes
Semi-Supervised Speech Recognition via Local Pri...
2020-02-24
Code
#1
SAMBA ASR
SOTA
2.48
Word Error Rate (WER)
· 2025-01-06
Samba-ASR: State-Of-The-Art Speech Recognition Leveraging Structured State-Space Models
#2
FAdam
SOTA
2.49
Word Error Rate (WER)
· 2024-05-21
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Code
#3
w2v-BERT XXL
SOTA
2.5
Word Error Rate (WER)
· 2021-08-07
W2v-BERT: Combining Contrastive Learning and Masked Language Modeling for Self-Supervised Speech Pre-Training
Code
#4
Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light
SOTA
2.6
Word Error Rate (WER)
· 2020-10-20
Pushing the Limits of Semi-Supervised Learning for Automatic Speech Recognition
Code
#5
HuBERT with Libri-Light
2.9
Word Error Rate (WER)
· 2021-06-14
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Code
#6
wav2vec 2.0 with Libri-Light
SOTA
3
Word Error Rate (WER)
· 2020-06-20
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Code
#7
Conv + Transformer + wav2vec2.0 + pseudo labeling
3.1
Word Error Rate (WER)
· 2020-10-22
Self-training and Pre-training are Complementary for Speech Recognition
Code
#8
WavLM Large
3.2
Word Error Rate (WER)
· 2021-10-26
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Code
#9
SpeechStew (1B)
3.3
Word Error Rate (WER)
· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#10
ContextNet + SpecAugment-based Noisy Student Training with Libri-Light
SOTA
3.4
Word Error Rate (WER)
· 2020-05-19
Improved Noisy Student Training for Automatic Speech Recognition
Code
#11
E-Branchformer (L) + Internal Language Model Estimation
3.65
Word Error Rate (WER)
· 2022-09-30
E-Branchformer: Branchformer with Enhanced merging for speech recognition
Code
#12
data2vec
3.7
Word Error Rate (WER)
· 2022-02-07
data2vec: A General Framework for Self-supervised Learning in Speech, Vision and Language
Code
#13
Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring)
SOTA
3.83
Word Error Rate (WER)
· 2020-05-19
Iterative Pseudo-Labeling for Speech Recognition
Code
#14
Conformer(L)
SOTA
3.9
Word Error Rate (WER)
· Extra Data
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#15
Zipformer+pruned transducer w/ CR-CTC (no external language model)
3.95
Word Error Rate (WER)
· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition
Code
#16
SpeechStew (100M)
4
Word Error Rate (WER)
· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#17
wav2vec 2.0
4.1
Word Error Rate (WER)
· Extra Data
· 2020-06-20
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
Code
#18
ContextNet(L)
SOTA
4.1
Word Error Rate (WER)
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#19
Conv + Transformer AM (ConvLM with Transformer Rescoring)
SOTA
4.11
Word Error Rate (WER)
· Extra Data
· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Code
#20
CTC + Transformer LM rescoring
4.2
Word Error Rate (WER)
· Extra Data
· 2020-05-19
Faster, Simpler and More Accurate Hybrid ASR Systems Using Wordpieces
#21
Transformer Transducer
4.2
Word Error Rate (WER)
· Extra Data
· 2020-11-05
Improving RNN Transducer Based ASR with Auxiliary Tasks
Code
#22
Qwen-Audio
4.2
Word Error Rate (WER)
· 2023-11-14
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Code
#23
Conformer(M)
4.3
Word Error Rate (WER)
· Extra Data
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#24
Zipformer+CR-CTC (no external language model)
4.35
Word Error Rate (WER)
· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition
Code
#25
Zipformer+pruned transducer (no external language model)
4.38
Word Error Rate (WER)
· 2023-10-17
Zipformer: A faster and better encoder for automatic speech recognition
Code
#26
Multistream CNN with Self-Attentive SRU
4.46
Word Error Rate (WER)
· 2020-05-21
ASAPP-ASR: Multistream CNN and Self-Attentive SRU for SOTA Speech Recognition
#27
ContextNet(M)
4.5
Word Error Rate (WER)
· Extra Data
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#28
hybrid + Transformer LM rescoring
SOTA
4.85
Word Error Rate (WER)
· Extra Data
· 2019-10-22
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
#29
Branchformer + GFSA
4.94
Word Error Rate (WER)
· 2023-12-07
Graph Convolutions Enrich the Self-Attention in Transformers!
Code
#30
Hybrid model with Transformer rescoring
SOTA
5
Word Error Rate (WER)
· 2019-05-08
RWTH ASR Systems for LibriSpeech: Hybrid vs Attention -- w/o Data Augmentation
Code
#31
Conformer(S)
5
Word Error Rate (WER)
· Extra Data
· 2020-05-16
Conformer: Convolution-augmented Transformer for Speech Recognition
Code
#32
Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only)
5.18
Word Error Rate (WER)
· 2019-11-19
End-to-end ASR: from Supervised to Semi-Supervised Learning with Modern Architectures
Code
#33
ContextNet(S)
5.5
Word Error Rate (WER)
· Extra Data
· 2020-05-07
ContextNet: Improving Convolutional Neural Networks for Automatic Speech Recognition with Global Context
Code
#34
LSTM Transducer
5.6
Word Error Rate (WER)
· Extra Data
· 2021-04-07
Librispeech Transducer Model with Internal Language Model Prior Correction
Code
#35
Transformer
5.7
Word Error Rate (WER)
· Extra Data
· 2019-09-13
A Comparative Study on Transformer vs RNN in Speech Applications
Code
#36
LAS + SpecAugment
SOTA
5.8
Word Error Rate (WER)
· Extra Data
· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Code
#37
Multi-Stream Self-Attention With Dilated 1D Convolutions
5.8
Word Error Rate (WER)
· 2019-10-01
State-of-the-Art Speech Recognition Using Multi-Stream Self-Attention With Dilated 1D Convolutions
Code
#38
Squeezeformer (L)
5.97
Word Error Rate (WER)
· 2022-06-02
Squeezeformer: An Efficient Transformer for Automatic Speech Recognition
Code
#39
LAS (no LM)
SOTA
6.5
Word Error Rate (WER)
· Extra Data
· 2019-04-18
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Code
#40
Conformer with Relaxed Attention
6.85
Word Error Rate (WER)
· 2021-07-02
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition
Code
#41
QuartzNet15x5
7.25
Word Error Rate (WER)
No paper
Code
#42
tdnn + chain + rnnlm rescoring
7.63
Word Error Rate (WER)
· Extra Data
No paper
#43
Jasper DR 10x5 (+ Time/Freq Masks)
SOTA
7.84
Word Error Rate (WER)
· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model
Code
#44
Espresso
8.7
Word Error Rate (WER)
· 2019-09-18
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit
Code
#45
Jasper DR 10x5
SOTA
8.79
Word Error Rate (WER)
· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model
Code
#46
MT4SSL
9.6
Word Error Rate (WER)
· 2022-11-14
MT4SSL: Boosting Self-Supervised Speech Representation Learning by Integrating Multiple Targets
Code
#47
Convolutional Speech Recognition
SOTA
10.47
Word Error Rate (WER)
· Extra Data
· 2018-12-17
Fully Convolutional Speech Recognition
#48
CTC-CRF 4gram-LM
10.65
Word Error Rate (WER)
No paper
Code
#49
TDNN + pNorm + speed up/down speech
12.5
Word Error Rate (WER)
No paper
#50
Deep Speech 2
SOTA
13.25
Word Error Rate (WER)
· 2015-12-08
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin
Code
#51
Local Prior Matching (Large Model, ConvLM LM)
15.28
Word Error Rate (WER)
· 2020-02-24
Semi-Supervised Speech Recognition via Local Prior Matching
Code
#52
Snips
16.5
Word Error Rate (WER)
· 2018-05-25
Snips Voice Platform: an embedded Spoken Language Understanding system for private-by-design voice interfaces
Code
#53
Local Prior Matching (Large Model)
20.84
Word Error Rate (WER)
· Extra Data
· 2020-02-24
Semi-Supervised Speech Recognition via Local Prior Matching
Code