Speech Recognition on AISHELL-1

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Word Error Rate (WER)▲	Extra Data	Paper	Date↕	Code
1	FireRedASR-AED	0.55	Yes	FireRedASR: Open-Source Industrial-Grade Mandari...	2025-01-24	Code
2	Seed-ASR	0.68	Yes	Seed-ASR: Understanding Diverse Speech and Conte...	2024-07-05	-
3	Qwen-Audio	1.29	Yes	Qwen-Audio: Advancing Universal Audio Understand...	2023-11-14	Code
4	MMSpeech With LM	1.9	No	MMSpeech: Multi-modal Multi-task Encoder-Decoder...	2022-11-29	Code
5	Paraformer-large	1.95	Yes	FunASR: A Fundamental End-to-End Speech Recognit...	2023-05-18	Code
6	Zipformer+CR-CTC (no external language model)	4.02	No	CR-CTC: Consistency regularization on CTC for im...	2024-10-07	Code
7	Lightweight Transducer With LM	4.03	No	Lightweight Transducer Based on Frame-Level Crit...	2024-09-05	Code
8	SE-WSBO With LM	4.1	No	Improving Mandarin Speech Recogntion with Block-...	2022-07-24	Code
9	CIF-HKD With LM	4.1	No	Knowledge Transfer from Pre-trained Language Mod...	2023-01-30	Code
10	Lightweight Transducer	4.31	No	Lightweight Transducer Based on Frame-Level Crit...	2024-09-05	Code
11	UMA	4.7	No	Unimodal Aggregation for CTC-based Speech Recogn...	2023-09-15	Code
12	U2	4.72	No	Unified Streaming and Non-streaming Two-pass End...	2020-12-10	Code
13	Paraformer	4.95	No	FunASR: A Fundamental End-to-End Speech Recognit...	2023-05-18	Code
14	BAT	4.97	No	BAT: Boundary aware transducer for memory-effici...	2023-05-19	Code
15	CTC-CRF 4gram-LM	6.34	No	CAT: A CTC-CRF based ASR Toolkit Bridging the Hy...	2020-05-27	Code
16	BRA-E	6.63	No	Beyond Universal Transformer: block reusing with...	2023-03-23	-
17	CTC/Att	6.7	No	A Comparative Study on Transformer vs RNN in Spe...	2019-09-13	Code
18	Att	18.7	No	End-to-end Speech Recognition with Adaptive Comp...	2018-08-30	-

#1FireRedASR-AEDSOTA
0.55
Word Error Rate (WER)· Extra Data· 2025-01-24
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration Code
#2Seed-ASRSOTA
0.68
Word Error Rate (WER)· Extra Data· 2024-07-05
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
#3Qwen-AudioSOTA
1.29
Word Error Rate (WER)· Extra Data· 2023-11-14
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models Code
#4MMSpeech With LMSOTA
1.9
Word Error Rate (WER)· 2022-11-29
MMSpeech: Multi-modal Multi-task Encoder-Decoder Pre-training for Speech Recognition Code
#5Paraformer-large
1.95
Word Error Rate (WER)· Extra Data· 2023-05-18
FunASR: A Fundamental End-to-End Speech Recognition Toolkit Code
#6Zipformer+CR-CTC (no external language model)
4.02
Word Error Rate (WER)· 2024-10-07
CR-CTC: Consistency regularization on CTC for improved speech recognition Code
#7Lightweight Transducer With LM
4.03
Word Error Rate (WER)· 2024-09-05
Lightweight Transducer Based on Frame-Level Criterion Code
#8SE-WSBO With LMSOTA
4.1
Word Error Rate (WER)· 2022-07-24
Improving Mandarin Speech Recogntion with Block-augmented Transformer Code
#9CIF-HKD With LM
4.1
Word Error Rate (WER)· 2023-01-30
Knowledge Transfer from Pre-trained Language Models to Cif-based Speech Recognizers via Hierarchical Distillation Code
#10Lightweight Transducer
4.31
Word Error Rate (WER)· 2024-09-05
Lightweight Transducer Based on Frame-Level Criterion Code
#11UMA
4.7
Word Error Rate (WER)· 2023-09-15
Unimodal Aggregation for CTC-based Speech Recognition Code
#12U2SOTA
4.72
Word Error Rate (WER)· 2020-12-10
Unified Streaming and Non-streaming Two-pass End-to-end Model for Speech Recognition Code
#13Paraformer
4.95
Word Error Rate (WER)· 2023-05-18
FunASR: A Fundamental End-to-End Speech Recognition Toolkit Code
#14BAT
4.97
Word Error Rate (WER)· 2023-05-19
BAT: Boundary aware transducer for memory-efficient and low-latency ASR Code
#15CTC-CRF 4gram-LMSOTA
6.34
Word Error Rate (WER)· 2020-05-27
CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency Code
#16BRA-E
6.63
Word Error Rate (WER)· 2023-03-23
Beyond Universal Transformer: block reusing with adaptor in Transformer for automatic speech recognition
#17CTC/AttSOTA
6.7
Word Error Rate (WER)· 2019-09-13
A Comparative Study on Transformer vs RNN in Speech Applications Code
#18AttSOTA
18.7
Word Error Rate (WER)· 2018-08-30
End-to-end Speech Recognition with Adaptive Computation Steps