TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Monaural Speech Enhancement with Complex Convolutional Blo...

Monaural Speech Enhancement with Complex Convolutional Block Attention Module and Joint Time Frequency Losses

Shengkui Zhao, Trung Hieu Nguyen, Bin Ma

2021-02-03Speech EnhancementSpeech Denoising
PaperPDFCodeCode(official)

Abstract

Deep complex U-Net structure and convolutional recurrent network (CRN) structure achieve state-of-the-art performance for monaural speech enhancement. Both deep complex U-Net and CRN are encoder and decoder structures with skip connections, which heavily rely on the representation power of the complex-valued convolutional layers. In this paper, we propose a complex convolutional block attention module (CCBAM) to boost the representation power of the complex-valued convolutional layers by constructing more informative features. The CCBAM is a lightweight and general module which can be easily integrated into any complex-valued convolutional layers. We integrate CCBAM with the deep complex U-Net and CRN to enhance their performance for speech enhancement. We further propose a mixed loss function to jointly optimize the complex models in both time-frequency (TF) domain and time domain. By integrating CCBAM and the mixed loss, we form a new end-to-end (E2E) complex speech enhancement framework. Ablation experiments and objective evaluations show the superior performance of the proposed approaches (https://github.com/modelscope/ClearerVoice-Studio).

Results

TaskDatasetMetricValueModel
Speech EnhancementWSJ0 + DEMAND + RNNoisePESQ-NB3.44DCUNet-MC
Speech EnhancementWSJ0 + DEMAND + RNNoisePESQ-NB3.28DCCRN-M
Speech EnhancementWSJ0 + DEMAND + RNNoisePESQ-NB3.25DCUNet
Speech EnhancementDeep Noise Suppression (DNS) ChallengePESQ-WB3.23FRCRN
Speech EnhancementVoiceBank + DEMANDPESQ (wb)3.43D2Former
Speech EnhancementVoiceBank + DEMANDPara. (M)0.86D2Former
Speech EnhancementDNS ChallengePESQ-NB3.21DCCRN-MC
Speech EnhancementDNS ChallengePESQ-NB3.15DCCRN-M
Speech EnhancementDNS ChallengePESQ-NB3.04DCCRN

Related Papers

Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15Robust One-step Speech Enhancement via Consistency Distillation2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01Frequency-Weighted Training Losses for Phoneme-Level DNN-based Speech Enhancement2025-06-23EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19A Comparative Evaluation of Deep Learning Models for Speech Enhancement in Real-World Noisy Environments2025-06-17