Speech Recognition on WSJ eval92

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Word Error Rate (WER)▲	Extra Data	Paper	Date↕	Code
1	Speechstew 100M	1.3	Yes	SpeechStew: Simply Mix All Available Speech Reco...	2021-04-05	-
2	ConformerXXL-P	1.3	No	BigSSL: Exploring the Frontier of Large-Scale Se...	2021-09-27	-
3	Task activating prompting generative correction	2.11	Yes	Generative Speech Recognition Error Correction w...	2023-09-27	-
4	RobustGER	2.2	Yes	It's Never Too Late: Fusing Acoustic Information...	2024-02-08	Code
5	tdnn + chain	2.32	No	-	-	-
6	CTC-CRF ST-NAS	2.77	No	Efficient Neural Architecture Search for End-to-...	2020-11-11	Code
7	End-to-end LF-MMI	3	No	-	-	-
8	Transformer with Relaxed Attention	3.19	No	Relaxed Attention: A Simple Method to Boost Perf...	2021-07-02	Code
9	CTC-CRF VGG-BLSTM	3.2	No	CAT: A CTC-CRF based ASR Toolkit Bridging the Hy...	2020-05-27	Code
10	Espresso	3.4	No	Espresso: A Fast End-to-end Neural Speech Recogn...	2019-09-18	Code
11	TC-DNN-BLSTM-DNN	3.5	No	Deep Recurrent Neural Networks for Acoustic Mode...	2015-04-07	-
12	Convolutional Speech Recognition	3.5	No	Fully Convolutional Speech Recognition	2018-12-17	-
13	test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*	3.6	No	-	-	-
14	Deep Speech 2	3.6	Yes	Deep Speech 2: End-to-End Speech Recognition in ...	2015-12-08	Code
15	CTC-CRF 4gram-LM	3.79	No	-	-	Code
16	CNN over RAW speech (wav)	5.6	No	-	-	-
17	Jasper 10x3	6.9	No	Jasper: An End-to-End Convolutional Neural Acous...	2019-04-05	Code

#1Speechstew 100MSOTA
1.3
Word Error Rate (WER)· Extra Data· 2021-04-05
SpeechStew: Simply Mix All Available Speech Recognition Data to Train One Large Neural Network
#2ConformerXXL-P
1.3
Word Error Rate (WER)· 2021-09-27
BigSSL: Exploring the Frontier of Large-Scale Semi-Supervised Learning for Automatic Speech Recognition
#3Task activating prompting generative correction
2.11
Word Error Rate (WER)· Extra Data· 2023-09-27
Generative Speech Recognition Error Correction with Large Language Models and Task-Activating Prompting
#4RobustGER
2.2
Word Error Rate (WER)· Extra Data· 2024-02-08
It's Never Too Late: Fusing Acoustic Information into Large Language Models for Automatic Speech Recognition Code
#5tdnn + chain
2.32
Word Error Rate (WER)
No paper
#6CTC-CRF ST-NASSOTA
2.77
Word Error Rate (WER)· 2020-11-11
Efficient Neural Architecture Search for End-to-end Speech Recognition via Straight-Through Gradients Code
#7End-to-end LF-MMI
3
Word Error Rate (WER)
No paper
#8Transformer with Relaxed Attention
3.19
Word Error Rate (WER)· 2021-07-02
Relaxed Attention: A Simple Method to Boost Performance of End-to-End Automatic Speech Recognition Code
#9CTC-CRF VGG-BLSTMSOTA
3.2
Word Error Rate (WER)· 2020-05-27
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency Code
#10EspressoSOTA
3.4
Word Error Rate (WER)· 2019-09-18
Espresso: A Fast End-to-end Neural Speech Recognition Toolkit Code
#11TC-DNN-BLSTM-DNNSOTA
3.5
Word Error Rate (WER)· 2015-04-07
Deep Recurrent Neural Networks for Acoustic Modelling
#12Convolutional Speech Recognition
3.5
Word Error Rate (WER)· 2018-12-17
Fully Convolutional Speech Recognition
#13test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm*
3.6
Word Error Rate (WER)
No paper
#14Deep Speech 2
3.6
Word Error Rate (WER)· Extra Data· 2015-12-08
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Code
#15CTC-CRF 4gram-LM
3.79
Word Error Rate (WER)
No paperCode
#16CNN over RAW speech (wav)
5.6
Word Error Rate (WER)
No paper
#17Jasper 10x3
6.9
Word Error Rate (WER)· 2019-04-05
Jasper: An End-to-End Convolutional Neural Acoustic Model Code