The PESQetarian: On the Relevance of Goodhart's Law for Speech Enhancement

Danilo de Oliveira, Simon Welker, Julius Richter, Timo Gerkmann

2024-06-05Speech Enhancement

Abstract

To obtain improved speech enhancement models, researchers often focus on increasing performance according to specific instrumental metrics. However, when the same metric is used in a loss function to optimize models, it may be detrimental to aspects that the given metric does not see. The goal of this paper is to illustrate the risk of overfitting a speech enhancement model to the metric used for evaluation. For this, we introduce enhancement models that exploit the widely used PESQ measure. Our "PESQetarian" model achieves 3.82 PESQ on VB-DMD while scoring very poorly in a listening experiment. While the obtained PESQ value of 3.82 would imply "state-of-the-art" PESQ-performance on the VB-DMD benchmark, our examples show that when optimizing w.r.t. a metric, an isolated evaluation on the same metric may be misleading. Instead, other metrics should be included in the evaluation and the resulting performance predictions should be confirmed by listening.

Results

Task	Dataset	Metric	Value	Model
Speech Enhancement	VoiceBank + DEMAND	CBAK	2.49	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	COVL	3.5	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	CSIG	3.63	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	ESTOI	0.84	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	PESQ (wb)	3.82	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	Para. (M)	30	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	SI-SDR	-19.8	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	SSNR	-2.72	PESQetarian
Speech Enhancement	VoiceBank + DEMAND	STOI	0.92	PESQetarian

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17 P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15 Robust One-step Speech Enhancement via Consistency Distillation2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23 EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19 A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17