TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DeFT-AN: Dense Frequency-Time Attentive Network for Multic...

DeFT-AN: Dense Frequency-Time Attentive Network for Multichannel Speech Enhancement

Dongheon Lee, Jung-Woo Choi

2022-12-15DenoisingSpeech EnhancementSpeech Dereverberation
PaperPDFCode

Abstract

In this study, we propose a dense frequency-time attentive network (DeFT-AN) for multichannel speech enhancement. DeFT-AN is a mask estimation network that predicts a complex spectral masking pattern for suppressing the noise and reverberation embedded in the short-time Fourier transform (STFT) of an input signal. The proposed mask estimation network incorporates three different types of blocks for aggregating information in the spatial, spectral, and temporal dimensions. It utilizes a spectral transformer with a modified feed-forward network and a temporal conformer with sequential dilated convolutions. The use of dense blocks and transformers dedicated to the three different characteristics of audio signals enables more comprehensive enhancement in noisy and reverberant environments. The remarkable performance of DeFT-AN over state-of-the-art multichannel models is demonstrated based on two popular noisy and reverberant datasets in terms of various metrics for speech quality and intelligibility.

Results

TaskDatasetMetricValueModel
Speech Enhancementspatialized DNS challengePESQ3.01DeFT-AN
Speech Enhancementspatialized DNS challengeSI-SDR9.9DeFT-AN
Speech Enhancementspatialized DNS challengeSTOI0.924DeFT-AN
Speech Enhancementspatialized WSJCAM0PESQ3.63DeFT-AN
Speech Enhancementspatialized WSJCAM0SI-SDR15.7DeFT-AN
Speech Enhancementspatialized WSJCAM0STOI0.981DeFT-AN

Related Papers

fastWDM3D: Fast and Accurate 3D Healthy Tissue Inpainting2025-07-17Diffuman4D: 4D Consistent Human View Synthesis from Sparse-View Videos with Spatio-Temporal Diffusion Models2025-07-17Autoregressive Speech Enhancement via Acoustic Tokens2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16HUG-VAS: A Hierarchical NURBS-Based Generative Model for Aortic Geometry Synthesis and Controllable Editing2025-07-15AirLLM: Diffusion Policy-based Adaptive LoRA for Remote Fine-Tuning of LLM over the Air2025-07-15P.808 Multilingual Speech Enhancement Testing: Approach and Results of URGENT 2025 Challenge2025-07-15A statistical physics framework for optimal learning2025-07-10