TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Phase-aware Single-stage Speech Denoising and Dereverberat...

Phase-aware Single-stage Speech Denoising and Dereverberation with U-Net

2020-06-01Interspeech 2020 6DenoisingSpeech EnhancementSpeech Denoising
PaperPDFCode

Abstract

In this work, we tackle a denoising and dereverberation problem with a single-stage framework. Although denoising and dereverberation may be considered two separate challenging tasks, and thus, two modules are typically required for each task, we show that a single deep network can be shared to solve the two problems. To this end, we propose a new masking method called phase-aware beta-sigmoid mask (PHM), which reuses the estimated magnitude values to estimate the clean phase by respecting the triangle inequality in the complex domain between three signal components such as mixture, source and the rest. Two PHMs are used to deal with direct and reverberant source, which allows controlling the proportion of reverberation in the enhanced speech at inference time. In addition, to improve the speech enhancement performance, we propose a new time-domain loss function and show a reasonable performance gain compared to MSE loss in the complex domain. Finally, to achieve a real-time inference, an optimization strategy for U-Net is proposed which significantly reduces the computational overhead up to 88.9% compared to the na\"ive version.

Results

TaskDatasetMetricValueModel
Speech EnhancementDeep Noise Suppression (DNS) ChallengePESQ-NB3.01Non-Real-Time MultiScale+
Speech EnhancementDeep Noise Suppression (DNS) ChallengeSI-SDR-WB16.22Non-Real-Time MultiScale+
Speech EnhancementWHAMR!PESQ1.52Non-Real-Time MultiScale+
Speech EnhancementWHAMR!SI-SDR5.33Non-Real-Time MultiScale+
Speech EnhancementWHAMR!PESQ3.16Non-Real-Time MultiScale+
Speech EnhancementWHAMR!SI-SDR10.4Non-Real-Time MultiScale+

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15A statistical physics framework for optimal learning2025-07-10