Metric: Word Error Rate (WER) (lower is better)
| # | Model↕ | Word Error Rate (WER)▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Speechstew 100M | 1.3 | Yes | SpeechStew: Simply Mix All Available Speech Reco... | 2021-04-05 | - |
| 2 | ConformerXXL-P | 1.3 | No | BigSSL: Exploring the Frontier of Large-Scale Se... | 2021-09-27 | - |
| 3 | Task activating prompting generative correction | 2.11 | Yes | Generative Speech Recognition Error Correction w... | 2023-09-27 | - |
| 4 | RobustGER | 2.2 | Yes | It's Never Too Late: Fusing Acoustic Information... | 2024-02-08 | Code |
| 5 | tdnn + chain | 2.32 | No | - | - | - |
| 6 | CTC-CRF ST-NAS | 2.77 | No | Efficient Neural Architecture Search for End-to-... | 2020-11-11 | Code |
| 7 | End-to-end LF-MMI | 3 | No | - | - | - |
| 8 | Transformer with Relaxed Attention | 3.19 | No | Relaxed Attention: A Simple Method to Boost Perf... | 2021-07-02 | Code |
| 9 | CTC-CRF VGG-BLSTM | 3.2 | No | CAT: A CTC-CRF based ASR Toolkit Bridging the Hy... | 2020-05-27 | Code |
| 10 | Espresso | 3.4 | No | Espresso: A Fast End-to-end Neural Speech Recogn... | 2019-09-18 | Code |
| 11 | TC-DNN-BLSTM-DNN | 3.5 | No | Deep Recurrent Neural Networks for Acoustic Mode... | 2015-04-07 | - |
| 12 | Convolutional Speech Recognition | 3.5 | No | Fully Convolutional Speech Recognition | 2018-12-17 | - |
| 13 | test-set on open vocabulary (i.e. harder), model = HMM-DNN + pNorm* | 3.6 | No | - | - | - |
| 14 | Deep Speech 2 | 3.6 | Yes | Deep Speech 2: End-to-End Speech Recognition in ... | 2015-12-08 | Code |
| 15 | CTC-CRF 4gram-LM | 3.79 | No | - | - | Code |
| 16 | CNN over RAW speech (wav) | 5.6 | No | - | - | - |
| 17 | Jasper 10x3 | 6.9 | No | Jasper: An End-to-End Convolutional Neural Acous... | 2019-04-05 | Code |