Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Audio
/
Speech Recognition
/
Switchboard + Hub500
Speech Recognition on Switchboard + Hub500
Metric: Percentage error (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Export CSV
Sort:
Percentage error (best first)
Percentage error (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Percentage error
▲
Extra Data
Paper
Date
↕
Code
1
IBM (LSTM+Conformer encoder-decoder)
4.3
No
On the limit of English conversational speech re...
2021-05-03
-
2
IBM (LSTM encoder-decoder)
4.7
No
Single headed attention based sequence-to-sequen...
2020-01-20
-
3
ResNet + BiLSTMs acoustic model
5.5
No
English Conversational Telephone Speech Recognit...
2017-03-06
-
4
Microsoft 2016b
5.8
No
Achieving Human Parity in Conversational Speech ...
2016-10-17
-
5
Microsoft 2016
6.2
No
The Microsoft 2016 Conversational Speech Recogni...
2016-09-12
-
6
VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast
6.3
No
The Microsoft 2016 Conversational Speech Recogni...
2016-09-12
-
7
RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model
6.6
No
The IBM 2016 English Conversational Telephone Sp...
2016-04-27
-
8
CNN-LSTM
6.6
No
Achieving Human Parity in Conversational Speech ...
2016-10-17
-
9
IBM 2016
6.9
No
The IBM 2016 English Conversational Telephone Sp...
2016-04-27
-
10
RNNLM
6.9
No
The Microsoft 2016 Conversational Speech Recogni...
2016-09-12
-
11
IBM 2015
8
No
The IBM 2015 English Conversational Telephone Sp...
2015-05-21
-
12
HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher
8.5
No
-
-
-
13
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only)
9.2
No
-
-
-
14
CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN
10.4
No
-
-
-
15
HMM-TDNN + iVectors
11
No
-
-
-
16
CNN
11.5
No
-
-
-
17
Deep CNN (10 conv, 4 FC layers), multi-scale feature maps
12.2
No
Very Deep Multilingual Convolutional Neural Netw...
2015-09-29
-
18
HMM-DNN +sMBR
12.6
No
-
-
-
19
DNN sMBR
12.6
No
-
-
-
20
Deep Speech + FSH
12.6
No
Deep Speech: Scaling up end-to-end speech recogn...
2014-12-17
Code
21
CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB
12.6
No
Deep Speech: Scaling up end-to-end speech recogn...
2014-12-17
Code
22
DNN MMI
12.9
No
-
-
-
23
DNN MPE
12.9
No
-
-
-
24
DNN BMMI
12.9
No
-
-
-
25
HMM-TDNN + pNorm + speed up/down speech
12.9
No
-
-
-
26
DNN + Dropout
15
No
Building DNN Acoustic Models for Large Vocabular...
2014-06-30
Code
27
DNN
16
No
Building DNN Acoustic Models for Large Vocabular...
2014-06-30
Code
28
CD-DNN
16.1
No
-
-
-
29
DNN-HMM
18.5
No
-
-
-
30
Deep Speech
20
No
Deep Speech: Scaling up end-to-end speech recogn...
2014-12-17
Code
#1
IBM (LSTM+Conformer encoder-decoder)
SOTA
4.3
Percentage error
· 2021-05-03
On the limit of English conversational speech recognition
#2
IBM (LSTM encoder-decoder)
SOTA
4.7
Percentage error
· 2020-01-20
Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard
#3
ResNet + BiLSTMs acoustic model
SOTA
5.5
Percentage error
· 2017-03-06
English Conversational Telephone Speech Recognition by Humans and Machines
#4
Microsoft 2016b
SOTA
5.8
Percentage error
· 2016-10-17
Achieving Human Parity in Conversational Speech Recognition
#5
Microsoft 2016
SOTA
6.2
Percentage error
· 2016-09-12
The Microsoft 2016 Conversational Speech Recognition System
#6
VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast
SOTA
6.3
Percentage error
· 2016-09-12
The Microsoft 2016 Conversational Speech Recognition System
#7
RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model
SOTA
6.6
Percentage error
· 2016-04-27
The IBM 2016 English Conversational Telephone Speech Recognition System
#8
CNN-LSTM
6.6
Percentage error
· 2016-10-17
Achieving Human Parity in Conversational Speech Recognition
#9
IBM 2016
SOTA
6.9
Percentage error
· 2016-04-27
The IBM 2016 English Conversational Telephone Speech Recognition System
#10
RNNLM
6.9
Percentage error
· 2016-09-12
The Microsoft 2016 Conversational Speech Recognition System
#11
IBM 2015
SOTA
8
Percentage error
· 2015-05-21
The IBM 2015 English Conversational Telephone Speech Recognition System
#12
HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher
8.5
Percentage error
No paper
#13
HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only)
9.2
Percentage error
No paper
#14
CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN
10.4
Percentage error
No paper
#15
HMM-TDNN + iVectors
11
Percentage error
No paper
#16
CNN
11.5
Percentage error
No paper
#17
Deep CNN (10 conv, 4 FC layers), multi-scale feature maps
12.2
Percentage error
· 2015-09-29
Very Deep Multilingual Convolutional Neural Networks for LVCSR
#18
HMM-DNN +sMBR
12.6
Percentage error
No paper
#19
DNN sMBR
12.6
Percentage error
No paper
#20
Deep Speech + FSH
SOTA
12.6
Percentage error
· 2014-12-17
Deep Speech: Scaling up end-to-end speech recognition
Code
#21
CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB
12.6
Percentage error
· 2014-12-17
Deep Speech: Scaling up end-to-end speech recognition
Code
#22
DNN MMI
12.9
Percentage error
No paper
#23
DNN MPE
12.9
Percentage error
No paper
#24
DNN BMMI
12.9
Percentage error
No paper
#25
HMM-TDNN + pNorm + speed up/down speech
12.9
Percentage error
No paper
#26
DNN + Dropout
SOTA
15
Percentage error
· 2014-06-30
Building DNN Acoustic Models for Large Vocabulary Speech Recognition
Code
#27
DNN
SOTA
16
Percentage error
· 2014-06-30
Building DNN Acoustic Models for Large Vocabulary Speech Recognition
Code
#28
CD-DNN
16.1
Percentage error
No paper
#29
DNN-HMM
18.5
Percentage error
No paper
#30
Deep Speech
20
Percentage error
· 2014-12-17
Deep Speech: Scaling up end-to-end speech recognition
Code