| 1 | United Med ASR | 0.985 | Yes | High-precision medical speech recognition throug... | 2024-11-24 | - |
| 2 | SAMBA ASR | 1.17 | Yes | Samba-ASR: State-Of-The-Art Speech Recognition L... | 2025-01-06 | - |
| 3 | FAdam | 1.34 | Yes | FAdam: Adam is a natural gradient optimizer usin... | 2024-05-21 | Code |
| 4 | Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light | 1.4 | Yes | Pushing the Limits of Semi-Supervised Learning f... | 2020-10-20 | Code |
| 5 | w2v-BERT XXL | 1.4 | Yes | W2v-BERT: Combining Contrastive Learning and Mas... | 2021-08-07 | Code |
| 6 | parakeet-rnnt-1.1b | 1.46 | Yes | Fast Conformer with Linearly Scalable Attention ... | 2023-05-08 | - |
| 7 | Conv + Transformer + wav2vec2.0 + pseudo labeling | 1.5 | Yes | Self-training and Pre-training are Complementary... | 2020-10-22 | Code |
| 8 | ContextNet + SpecAugment-based Noisy Student Training with Libri-Light | 1.7 | Yes | Improved Noisy Student Training for Automatic Sp... | 2020-05-19 | Code |
| 9 | SpeechStew (1B) | 1.7 | Yes | SpeechStew: Simply Mix All Available Speech Reco... | 2021-04-05 | - |
| 10 | Multistream CNN with Self-Attentive SRU (WER includes text normalization) | 1.75 | Yes | ASAPP-ASR: Multistream CNN and Self-Attentive SR... | 2020-05-21 | - |
| 11 | Stateformer | 1.76 | No | Multi-Head State Space Model for Speech Recognit... | 2023-05-21 | - |
| 12 | wav2vec 2.0 with Libri-Light | 1.8 | Yes | wav2vec 2.0: A Framework for Self-Supervised Lea... | 2020-06-20 | Code |
| 13 | HuBERT with Libri-Light | 1.8 | Yes | HuBERT: Self-Supervised Speech Representation Le... | 2021-06-14 | Code |
| 14 | WavLM Large | 1.8 | No | WavLM: Large-Scale Self-Supervised Pre-Training ... | 2021-10-26 | Code |
| 15 | E-Branchformer (L) + Internal Language Model Estimation | 1.81 | No | E-Branchformer: Branchformer with Enhanced mergi... | 2022-09-30 | Code |
| 16 | Zipformer+pruned transducer w/ CR-CTC (no external language model) | 1.88 | No | CR-CTC: Consistency regularization on CTC for im... | 2024-10-07 | Code |
| 17 | ContextNet(L) | 1.9 | No | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 18 | Conformer(L) | 1.9 | No | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 19 | Transformer+Time reduction+Self Knowledge distillation | 1.9 | No | Transformer-based ASR Incorporating Time-reducti... | 2021-03-17 | - |
| 20 | ContextNet(M) | 2 | Yes | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 21 | Transformer Transducer | 2 | No | Improving RNN Transducer Based ASR with Auxiliar... | 2020-11-05 | Code |
| 22 | Conformer(M) | 2 | Yes | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 23 | SpeechStew (100M) | 2 | No | SpeechStew: Simply Mix All Available Speech Reco... | 2021-04-05 | - |
| 24 | Qwen-Audio | 2 | No | Qwen-Audio: Advancing Universal Audio Understand... | 2023-11-14 | Code |
| 25 | Zipformer+pruned transducer (no external language model) | 2 | No | Zipformer: A faster and better encoder for autom... | 2023-10-17 | Code |
| 26 | Zipformer+CR-CTC (no external language model) | 2.02 | No | CR-CTC: Consistency regularization on CTC for im... | 2024-10-07 | Code |
| 27 | Conv + Transformer AM + Pseudo-Labeling (ConvLM with Transformer Rescoring) | 2.03 | No | End-to-end ASR: from Supervised to Semi-Supervis... | 2019-11-19 | Code |
| 28 | Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring) | 2.1 | No | Iterative Pseudo-Labeling for Speech Recognition | 2020-05-19 | Code |
| 29 | CTC + Transformer LM rescoring | 2.1 | No | Faster, Simpler and More Accurate Hybrid ASR Sys... | 2020-05-19 | - |
| 30 | Conformer(S) | 2.1 | No | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 31 | Branchformer + GFSA | 2.11 | No | Graph Convolutions Enrich the Self-Attention in ... | 2023-12-07 | Code |
| 32 | Multi-Stream Self-Attention With Dilated 1D Convolutions | 2.2 | No | State-of-the-Art Speech Recognition Using Multi-... | 2019-10-01 | Code |
| 33 | LSTM Transducer | 2.23 | Yes | Librispeech Transducer Model with Internal Langu... | 2021-04-07 | Code |
| 34 | Hybrid + Transformer LM rescoring | 2.26 | No | Transformer-based Acoustic Modeling for Hybrid S... | 2019-10-22 | - |
| 35 | Hybrid model with Transformer rescoring | 2.3 | No | RWTH ASR Systems for LibriSpeech: Hybrid vs Atte... | 2019-05-08 | Code |
| 36 | ContextNet(S) | 2.3 | Yes | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 37 | Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only) | 2.31 | No | End-to-end ASR: from Supervised to Semi-Supervis... | 2019-11-19 | Code |
| 38 | Squeezeformer (L) | 2.47 | No | Squeezeformer: An Efficient Transformer for Auto... | 2022-06-02 | Code |
| 39 | LAS + SpecAugment | 2.5 | Yes | SpecAugment: A Simple Data Augmentation Method f... | 2019-04-18 | Code |
| 40 | Transformer | 2.6 | Yes | A Comparative Study on Transformer vs RNN in Spe... | 2019-09-13 | Code |
| 41 | QuartzNet15x5 | 2.69 | No | - | - | Code |
| 42 | LAS (no LM) | 2.7 | Yes | SpecAugment: A Simple Data Augmentation Method f... | 2019-04-18 | Code |
| 43 | wav2vec_wav2letter | 2.7 | No | Self-training and Pre-training are Complementary... | 2020-10-22 | Code |
| 44 | Espresso | 2.8 | No | Espresso: A Fast End-to-end Neural Speech Recogn... | 2019-09-18 | Code |
| 45 | Jasper DR 10x5 (+ Time/Freq Masks) | 2.84 | No | Jasper: An End-to-End Convolutional Neural Acous... | 2019-04-05 | Code |
| 46 | Jasper DR 10x5 | 2.95 | No | Jasper: An End-to-End Convolutional Neural Acous... | 2019-04-05 | Code |
| 47 | tdnn + chain + rnnlm rescoring | 3.06 | No | - | - | - |
| 48 | Convolutional Speech Recognition | 3.26 | Yes | Fully Convolutional Speech Recognition | 2018-12-17 | - |
| 49 | MT4SSL | 3.4 | No | MT4SSL: Boosting Self-Supervised Speech Represen... | 2022-11-14 | Code |
| 50 | Model Unit Exploration | 3.6 | No | On the Choice of Modeling Unit for Sequence-to-S... | 2019-02-05 | Code |
| 51 | Seq-to-seq attention | 3.82 | Yes | Improved training of end-to-end attention models... | 2018-05-08 | Code |
| 52 | CTC-CRF 4gram-LM | 4.09 | No | - | - | Code |
| 53 | HMM-TDNN trained with MMI + data augmentation (speed) + iVectors + 3 regularizations | 4.3 | No | - | - | - |
| 54 | Centaurus (30 M) | 4.4 | No | Let SSMs be ConvNets: State-space Modeling with ... | 2025-01-22 | - |
| 55 | HMM-TDNN + iVectors | 4.8 | Yes | - | - | - |
| 56 | Gated ConvNets | 4.8 | No | Letter-Based Speech Recognition with Gated ConvN... | 2017-12-22 | Code |
| 57 | Deep Speech 2 | 5.33 | No | Deep Speech 2: End-to-End Speech Recognition in ... | 2015-12-08 | Code |
| 58 | CTC + policy learning | 5.42 | No | Improving End-to-End Speech Recognition with Pol... | 2017-12-19 | - |
| 59 | HMM-DNN + pNorm* | 5.5 | Yes | - | - | - |
| 60 | Li-GRU | 6.2 | No | The PyTorch-Kaldi Speech Recognition Toolkit | 2018-11-19 | Code |
| 61 | Snips | 6.4 | No | Snips Voice Platform: an embedded Spoken Languag... | 2018-05-25 | Code |
| 62 | Local Prior Matching (Large Model) | 7.19 | No | Semi-Supervised Speech Recognition via Local Pri... | 2020-02-24 | Code |
| 63 | HMM-(SAT)GMM | 8 | Yes | - | - | - |
| 64 | AmNet | 8.6 | No | Amortized Neural Networks for Low-Latency Speech... | 2021-08-03 | - |