TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Building state-of-the-art distant speech recognition using...

Building state-of-the-art distant speech recognition using the CHiME-4 challenge with a setup of speech enhancement baseline

Szu-Jui Chen, Aswin Shanmugam Subramanian, Hainan Xu, Shinji Watanabe

2018-03-27Speech RecognitionAutomatic Speech RecognitionAutomatic Speech Recognition (ASR)speech-recognitionDistant Speech RecognitionSpeech EnhancementLanguage Modelling
PaperPDF

Abstract

This paper describes a new baseline system for automatic speech recognition (ASR) in the CHiME-4 challenge to promote the development of noisy ASR in speech processing communities by providing 1) state-of-the-art system with a simplified single system comparable to the complicated top systems in the challenge, 2) publicly available and reproducible recipe through the main repository in the Kaldi speech recognition toolkit. The proposed system adopts generalized eigenvalue beamforming with bidirectional long short-term memory (LSTM) mask estimation. We also propose to use a time delay neural network (TDNN) based on the lattice-free version of the maximum mutual information (LF-MMI) trained with augmented all six microphones plus the enhanced data after beamforming. Finally, we use a LSTM language model for lattice and n-best re-scoring. The final system achieved 2.74\% WER for the real test set in the 6-channel track, which corresponds to the 2nd place in the challenge. In addition, the proposed baseline recipe includes four different speech enhancement measures, short-time objective intelligibility measure (STOI), extended STOI (eSTOI), perceptual evaluation of speech quality (PESQ) and speech distortion ratio (SDR) for the simulation test set. Thus, the recipe also provides an experimental platform for speech enhancement studies with these performance measures.

Results

TaskDatasetMetricValueModel
Speech RecognitionCHiME-4 real 6chWord Error Rate (WER)2.74HMM-TDNN(LFMMI) + LSTMLM + NN-GEV
Speech RecognitionCHiME realPercentage error11.4HMM-TDNN(LFMMI) + LSTMLM

Related Papers

Visual-Language Model Knowledge Distillation Method for Image Quality Assessment2025-07-21Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Making Language Model a Hierarchical Classifier and Generator2025-07-17VisionThink: Smart and Efficient Vision Language Model via Reinforcement Learning2025-07-17The Generative Energy Arena (GEA): Incorporating Energy Awareness in Large Language Model (LLM) Human Evaluations2025-07-17Inverse Reinforcement Learning Meets Large Language Model Post-Training: Basics, Advances, and Opportunities2025-07-17