TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Investigating Training Objectives for Generative Speech En...

Investigating Training Objectives for Generative Speech Enhancement

Julius Richter, Danilo de Oliveira, Timo Gerkmann

2024-09-16Speech Enhancement
PaperPDFCode(official)

Abstract

Generative speech enhancement has recently shown promising advancements in improving speech quality in noisy environments. Multiple diffusion-based frameworks exist, each employing distinct training objectives and learning techniques. This paper aims to explain the differences between these frameworks by focusing our investigation on score-based generative models and the Schr\"odinger bridge. We conduct a series of comprehensive experiments to compare their performance and highlight differing training behaviors. Furthermore, we propose a novel perceptual loss function tailored for the Schr\"odinger bridge framework, demonstrating enhanced performance and improved perceptual quality of the enhanced speech signals. All experimental code and pre-trained models are publicly available to facilitate further research and development in this domain.

Results

TaskDatasetMetricValueModel
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.7Schrödinger bridge (PESQ loss)
Speech EnhancementEARS-WHAMDNSMOS3.72Schrödinger Bridge (PESQ loss)
Speech EnhancementEARS-WHAMESTOI0.73Schrödinger Bridge (PESQ loss)
Speech EnhancementEARS-WHAMPESQ-WB3.09Schrödinger Bridge (PESQ loss)
Speech EnhancementEARS-WHAMPOLQA3.71Schrödinger Bridge (PESQ loss)
Speech EnhancementEARS-WHAMSI-SDR16.29Schrödinger Bridge (PESQ loss)
Speech EnhancementEARS-WHAMSIGMOS3.18Schrödinger Bridge (PESQ loss)

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15Robust One-step Speech Enhancement via Consistency Distillation2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17