TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Schrödinger Bridge for Generative Speech Enhancement

Schrödinger Bridge for Generative Speech Enhancement

Ante Jukić, Roman Korostik, Jagadeesh Balam, Boris Ginsburg

2024-07-22DenoisingSpeech EnhancementSpeech DereverberationSpeech Denoising
PaperPDF

Abstract

This paper proposes a generative speech enhancement model based on Schr\"odinger bridge (SB). The proposed model is employing a tractable SB to formulate a data-to-data process between the clean speech distribution and the observed noisy speech distribution. The model is trained with a data prediction loss, aiming to recover the complex-valued clean speech coefficients, and an auxiliary time-domain loss is used to improve training of the model. The effectiveness of the proposed SB-based model is evaluated in two different speech enhancement tasks: speech denoising and speech dereverberation. The experimental results demonstrate that the proposed SB-based outperforms diffusion-based models in terms of speech quality metrics and ASR performance, e.g., resulting in relative word error rate reduction of 20% for denoising and 6% for dereverberation compared to the best baseline model. The proposed model also demonstrates improved efficiency, achieving better quality than the baselines for the same number of sampling steps and with a reduced computational cost.

Results

TaskDatasetMetricValueModel
Speech EnhancementEARS-WHAMDNSMOS3.83Schrödinger Bridge
Speech EnhancementEARS-WHAMESTOI0.73Schrödinger Bridge
Speech EnhancementEARS-WHAMPESQ-WB2.33Schrödinger Bridge
Speech EnhancementEARS-WHAMPOLQA3.46Schrödinger Bridge
Speech EnhancementEARS-WHAMSI-SDR17.85Schrödinger Bridge
Speech EnhancementEARS-WHAMSIGMOS3.44Schrödinger Bridge

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15A statistical physics framework for optimal learning2025-07-10