| 1 | SAMBA ASR | 2.48 | No | Samba-ASR: State-Of-The-Art Speech Recognition L... | 2025-01-06 | - |
| 2 | FAdam | 2.49 | No | FAdam: Adam is a natural gradient optimizer usin... | 2024-05-21 | Code |
| 3 | w2v-BERT XXL | 2.5 | No | W2v-BERT: Combining Contrastive Learning and Mas... | 2021-08-07 | Code |
| 4 | Conformer + Wav2vec 2.0 + SpecAugment-based Noisy Student Training with Libri-Light | 2.6 | No | Pushing the Limits of Semi-Supervised Learning f... | 2020-10-20 | Code |
| 5 | HuBERT with Libri-Light | 2.9 | No | HuBERT: Self-Supervised Speech Representation Le... | 2021-06-14 | Code |
| 6 | wav2vec 2.0 with Libri-Light | 3 | No | wav2vec 2.0: A Framework for Self-Supervised Lea... | 2020-06-20 | Code |
| 7 | Conv + Transformer + wav2vec2.0 + pseudo labeling | 3.1 | No | Self-training and Pre-training are Complementary... | 2020-10-22 | Code |
| 8 | WavLM Large | 3.2 | No | WavLM: Large-Scale Self-Supervised Pre-Training ... | 2021-10-26 | Code |
| 9 | SpeechStew (1B) | 3.3 | No | SpeechStew: Simply Mix All Available Speech Reco... | 2021-04-05 | - |
| 10 | ContextNet + SpecAugment-based Noisy Student Training with Libri-Light | 3.4 | No | Improved Noisy Student Training for Automatic Sp... | 2020-05-19 | Code |
| 11 | E-Branchformer (L) + Internal Language Model Estimation | 3.65 | No | E-Branchformer: Branchformer with Enhanced mergi... | 2022-09-30 | Code |
| 12 | data2vec | 3.7 | No | data2vec: A General Framework for Self-supervise... | 2022-02-07 | Code |
| 13 | Conv + Transformer AM + Iterative Pseudo-Labeling (n-gram LM + Transformer Rescoring) | 3.83 | No | Iterative Pseudo-Labeling for Speech Recognition | 2020-05-19 | Code |
| 14 | Conformer(L) | 3.9 | Yes | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 15 | Zipformer+pruned transducer w/ CR-CTC
(no external language model) | 3.95 | No | CR-CTC: Consistency regularization on CTC for im... | 2024-10-07 | Code |
| 16 | SpeechStew (100M) | 4 | No | SpeechStew: Simply Mix All Available Speech Reco... | 2021-04-05 | - |
| 17 | wav2vec 2.0 | 4.1 | Yes | wav2vec 2.0: A Framework for Self-Supervised Lea... | 2020-06-20 | Code |
| 18 | ContextNet(L) | 4.1 | No | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 19 | Conv + Transformer AM (ConvLM with Transformer Rescoring) | 4.11 | Yes | End-to-end ASR: from Supervised to Semi-Supervis... | 2019-11-19 | Code |
| 20 | CTC + Transformer LM rescoring | 4.2 | Yes | Faster, Simpler and More Accurate Hybrid ASR Sys... | 2020-05-19 | - |
| 21 | Transformer Transducer | 4.2 | Yes | Improving RNN Transducer Based ASR with Auxiliar... | 2020-11-05 | Code |
| 22 | Qwen-Audio | 4.2 | No | Qwen-Audio: Advancing Universal Audio Understand... | 2023-11-14 | Code |
| 23 | Conformer(M) | 4.3 | Yes | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 24 | Zipformer+CR-CTC
(no external language model) | 4.35 | No | CR-CTC: Consistency regularization on CTC for im... | 2024-10-07 | Code |
| 25 | Zipformer+pruned transducer
(no external language model) | 4.38 | No | Zipformer: A faster and better encoder for autom... | 2023-10-17 | Code |
| 26 | Multistream CNN with Self-Attentive SRU | 4.46 | No | ASAPP-ASR: Multistream CNN and Self-Attentive SR... | 2020-05-21 | - |
| 27 | ContextNet(M) | 4.5 | Yes | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 28 | hybrid + Transformer LM rescoring | 4.85 | Yes | Transformer-based Acoustic Modeling for Hybrid S... | 2019-10-22 | - |
| 29 | Branchformer + GFSA | 4.94 | No | Graph Convolutions Enrich the Self-Attention in ... | 2023-12-07 | Code |
| 30 | Hybrid model with Transformer rescoring | 5 | No | RWTH ASR Systems for LibriSpeech: Hybrid vs Atte... | 2019-05-08 | Code |
| 31 | Conformer(S) | 5 | Yes | Conformer: Convolution-augmented Transformer for... | 2020-05-16 | Code |
| 32 | Conv + Transformer AM (ConvLM with Transformer Rescoring) (LS only) | 5.18 | No | End-to-end ASR: from Supervised to Semi-Supervis... | 2019-11-19 | Code |
| 33 | ContextNet(S) | 5.5 | Yes | ContextNet: Improving Convolutional Neural Netwo... | 2020-05-07 | Code |
| 34 | LSTM Transducer | 5.6 | Yes | Librispeech Transducer Model with Internal Langu... | 2021-04-07 | Code |
| 35 | Transformer | 5.7 | Yes | A Comparative Study on Transformer vs RNN in Spe... | 2019-09-13 | Code |
| 36 | LAS + SpecAugment | 5.8 | Yes | SpecAugment: A Simple Data Augmentation Method f... | 2019-04-18 | Code |
| 37 | Multi-Stream Self-Attention With Dilated 1D Convolutions | 5.8 | No | State-of-the-Art Speech Recognition Using Multi-... | 2019-10-01 | Code |
| 38 | Squeezeformer (L) | 5.97 | No | Squeezeformer: An Efficient Transformer for Auto... | 2022-06-02 | Code |
| 39 | LAS (no LM) | 6.5 | Yes | SpecAugment: A Simple Data Augmentation Method f... | 2019-04-18 | Code |
| 40 | Conformer with Relaxed Attention | 6.85 | No | Relaxed Attention: A Simple Method to Boost Perf... | 2021-07-02 | Code |
| 41 | QuartzNet15x5 | 7.25 | No | - | - | Code |
| 42 | tdnn + chain + rnnlm rescoring | 7.63 | Yes | - | - | - |
| 43 | Jasper DR 10x5 (+ Time/Freq Masks) | 7.84 | No | Jasper: An End-to-End Convolutional Neural Acous... | 2019-04-05 | Code |
| 44 | Espresso | 8.7 | No | Espresso: A Fast End-to-end Neural Speech Recogn... | 2019-09-18 | Code |
| 45 | Jasper DR 10x5 | 8.79 | No | Jasper: An End-to-End Convolutional Neural Acous... | 2019-04-05 | Code |
| 46 | MT4SSL | 9.6 | No | MT4SSL: Boosting Self-Supervised Speech Represen... | 2022-11-14 | Code |
| 47 | Convolutional Speech Recognition | 10.47 | Yes | Fully Convolutional Speech Recognition | 2018-12-17 | - |
| 48 | CTC-CRF 4gram-LM | 10.65 | No | - | - | Code |
| 49 | TDNN + pNorm + speed up/down speech | 12.5 | No | - | - | - |
| 50 | Deep Speech 2 | 13.25 | No | Deep Speech 2: End-to-End Speech Recognition in ... | 2015-12-08 | Code |
| 51 | Local Prior Matching (Large Model, ConvLM LM) | 15.28 | No | Semi-Supervised Speech Recognition via Local Pri... | 2020-02-24 | Code |
| 52 | Snips | 16.5 | No | Snips Voice Platform: an embedded Spoken Languag... | 2018-05-25 | Code |
| 53 | Local Prior Matching (Large Model) | 20.84 | Yes | Semi-Supervised Speech Recognition via Local Pri... | 2020-02-24 | Code |