CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Nian Shao, Rui Zhou, Pengyu Wang, Xian Li, Ying Fang, Yujie Yang, Xiaofei Li

2025-02-27Speech Recognition Denoising Automatic Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speech Enhancement

Paper PDF Code(official)

Abstract

In this work, we propose CleanMel, a single-channel Mel-spectrogram denoising and dereverberation network for improving both speech quality and automatic speech recognition (ASR) performance. The proposed network takes as input the noisy and reverberant microphone recording and predicts the corresponding clean Mel-spectrogram. The enhanced Mel-spectrogram can be either transformed to speech waveform with a neural vocoder or directly used for ASR. The proposed network is composed of interleaved cross-band and narrow-band processing in the Mel-frequency domain, for learning the full-band spectral pattern and the narrow-band properties of signals, respectively. Compared to linear-frequency domain or time-domain speech enhancement, the key advantage of Mel-spectrogram enhancement is that Mel-frequency presents speech in a more compact way and thus is easier to learn, which will benefit both speech quality and ASR. Experimental results on four English and one Chinese datasets demonstrate a significant improvement in both speech quality and ASR performance achieved by the proposed model. Code and audio examples of our model are available online in https://audio.westlake.edu.cn/Research/CleanMel.html.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	RealMAN	CER	14.4	CleanMel-L-mask
Speech Enhancement	RealMAN	DNSMOS	3.82	CleanMel-L-map
Speech Enhancement	RealMAN	DNSMOS BAK	4.03	CleanMel-L-map
Speech Enhancement	RealMAN	DNSMOS OVRL	3.25	CleanMel-L-map
Speech Enhancement	RealMAN	DNSMOS SIG	3.55	CleanMel-L-map
Speech Enhancement	RealMAN	PESQ-WB	2.1	CleanMel-L-map
Automatic Speech Recognition (ASR)	RealMAN	CER	14.4	CleanMel-L-mask

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Abstract

Results

Related Papers

CleanMel: Mel-Spectrogram Enhancement for Improving Both Speech Quality and ASR

Abstract

Results

Related Papers