Metric: Percentage error (lower is better)
| # | Model↕ | Percentage error▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | IBM (LSTM+Conformer encoder-decoder) | 4.3 | No | On the limit of English conversational speech re... | 2021-05-03 | - |
| 2 | IBM (LSTM encoder-decoder) | 4.7 | No | Single headed attention based sequence-to-sequen... | 2020-01-20 | - |
| 3 | ResNet + BiLSTMs acoustic model | 5.5 | No | English Conversational Telephone Speech Recognit... | 2017-03-06 | - |
| 4 | Microsoft 2016b | 5.8 | No | Achieving Human Parity in Conversational Speech ... | 2016-10-17 | - |
| 5 | Microsoft 2016 | 6.2 | No | The Microsoft 2016 Conversational Speech Recogni... | 2016-09-12 | - |
| 6 | VGG/Resnet/LACE/BiLSTM acoustic model trained on SWB+Fisher+CH, N-gram + RNNLM language model trained on Switchboard+Fisher+Gigaword+Broadcast | 6.3 | No | The Microsoft 2016 Conversational Speech Recogni... | 2016-09-12 | - |
| 7 | RNN + VGG + LSTM acoustic model trained on SWB+Fisher+CH, N-gram + "model M" + NNLM language model | 6.6 | No | The IBM 2016 English Conversational Telephone Sp... | 2016-04-27 | - |
| 8 | CNN-LSTM | 6.6 | No | Achieving Human Parity in Conversational Speech ... | 2016-10-17 | - |
| 9 | IBM 2016 | 6.9 | No | The IBM 2016 English Conversational Telephone Sp... | 2016-04-27 | - |
| 10 | RNNLM | 6.9 | No | The Microsoft 2016 Conversational Speech Recogni... | 2016-09-12 | - |
| 11 | IBM 2015 | 8 | No | The IBM 2015 English Conversational Telephone Sp... | 2015-05-21 | - |
| 12 | HMM-BLSTM trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher | 8.5 | No | - | - | - |
| 13 | HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations + Fisher (10% / 15.1% respectively trained on SWBD only) | 9.2 | No | - | - | - |
| 14 | CNN on MFSC/fbanks + 1 non-conv layer for FMLLR/I-Vectors concatenated in a DNN | 10.4 | No | - | - | - |
| 15 | HMM-TDNN + iVectors | 11 | No | - | - | - |
| 16 | CNN | 11.5 | No | - | - | - |
| 17 | Deep CNN (10 conv, 4 FC layers), multi-scale feature maps | 12.2 | No | Very Deep Multilingual Convolutional Neural Netw... | 2015-09-29 | - |
| 18 | HMM-DNN +sMBR | 12.6 | No | - | - | - |
| 19 | DNN sMBR | 12.6 | No | - | - | - |
| 20 | Deep Speech + FSH | 12.6 | No | Deep Speech: Scaling up end-to-end speech recogn... | 2014-12-17 | Code |
| 21 | CNN + Bi-RNN + CTC (speech to letters), 25.9% WER if trainedonlyon SWB | 12.6 | No | Deep Speech: Scaling up end-to-end speech recogn... | 2014-12-17 | Code |
| 22 | DNN MMI | 12.9 | No | - | - | - |
| 23 | DNN MPE | 12.9 | No | - | - | - |
| 24 | DNN BMMI | 12.9 | No | - | - | - |
| 25 | HMM-TDNN + pNorm + speed up/down speech | 12.9 | No | - | - | - |
| 26 | DNN + Dropout | 15 | No | Building DNN Acoustic Models for Large Vocabular... | 2014-06-30 | Code |
| 27 | DNN | 16 | No | Building DNN Acoustic Models for Large Vocabular... | 2014-06-30 | Code |
| 28 | CD-DNN | 16.1 | No | - | - | - |
| 29 | DNN-HMM | 18.5 | No | - | - | - |
| 30 | Deep Speech | 20 | No | Deep Speech: Scaling up end-to-end speech recogn... | 2014-12-17 | Code |