TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and t...

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

Keyu An, Hongyu Xiang, Zhijian Ou

2020-05-27Speech Recognitionspeech-recognition
PaperPDFCode(official)

Abstract

In this paper, we present a new open source toolkit for speech recognition, named CAT (CTC-CRF based ASR Toolkit). CAT inherits the data-efficiency of the hybrid approach and the simplicity of the E2E approach, providing a full-fledged implementation of CTC-CRFs and complete training and testing scripts for a number of English and Chinese benchmarks. Experiments show CAT obtains state-of-the-art results, which are comparable to the fine-tuned hybrid models in Kaldi but with a much simpler training pipeline. Compared to existing non-modularized E2E models, CAT performs better on limited-scale datasets, demonstrating its data efficiency. Furthermore, we propose a new method called contextualized soft forgetting, which enables CAT to do streaming ASR without accuracy degradation. We hope CAT, especially the CTC-CRF based framework and software, will be of broad interest to the community, and can be further explored and improved.

Results

TaskDatasetMetricValueModel
Speech RecognitionWSJ dev93Word Error Rate (WER)5.7CTC-CRF VGG-BLSTM
Speech RecognitionWSJ eval92Word Error Rate (WER)3.2CTC-CRF VGG-BLSTM
Speech RecognitionHub5'00 FISHER-SWBDWord Error Rate (WER)12CTC-CRF
Speech RecognitionHub5'00 SwitchBoardCallHome18.4CTC-CRF
Speech RecognitionHub5'00 SwitchBoardHub5'0014.1CTC-CRF
Speech RecognitionHub5'00 SwitchBoardSwitchBoard9.7CTC-CRF
Speech RecognitionAISHELL-1Word Error Rate (WER)6.34CTC-CRF 4gram-LM

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06First Steps Towards Voice Anonymization for Code-Switching Speech2025-07-02MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01AUTOMATIC PRONUNCIATION MISTAKE DETECTOR PROJECT REPORT2025-06-25