TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Deformable Temporal Convolutional Networks for Monaural No...

Deformable Temporal Convolutional Networks for Monaural Noisy Reverberant Speech Separation

William Ravenscroft, Stefan Goetze, Thomas Hain

2022-10-27Speech SeparationSpeech Dereverberation
PaperPDFCode(official)Code

Abstract

Speech separation models are used for isolating individual speakers in many speech processing applications. Deep learning models have been shown to lead to state-of-the-art (SOTA) results on a number of speech separation benchmarks. One such class of models known as temporal convolutional networks (TCNs) has shown promising results for speech separation tasks. A limitation of these models is that they have a fixed receptive field (RF). Recent research in speech dereverberation has shown that the optimal RF of a TCN varies with the reverberation characteristics of the speech signal. In this work deformable convolution is proposed as a solution to allow TCN models to have dynamic RFs that can adapt to various reverberation times for reverberant speech separation. The proposed models are capable of achieving an 11.1 dB average scale-invariant signalto-distortion ratio (SISDR) improvement over the input signal on the WHAMR benchmark. A relatively small deformable TCN model of 1.3M parameters is proposed which gives comparable separation performance to larger and more computationally complex models.

Results

TaskDatasetMetricValueModel
Speech SeparationWHAMR!MACs (G)3.7Deformable TCN + Dynamic Mixing
Speech SeparationWHAMR!Number of parameters (M)3.6Deformable TCN + Dynamic Mixing
Speech SeparationWHAMR!SDRi10.3Deformable TCN + Dynamic Mixing
Speech SeparationWHAMR!SI-SDRi11.1Deformable TCN + Dynamic Mixing
Speech SeparationWHAMR!MACs (G)3.7Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWHAMR!Number of parameters (M)1.3Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWHAMR!SDRi9.5Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWHAMR!SI-SDRi10.1Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWSJ0-2mixMACs (G)3.7Deformable TCN + Dynamic Mixing
Speech SeparationWSJ0-2mixNumber of parameters (M)3.6Deformable TCN + Dynamic Mixing
Speech SeparationWSJ0-2mixSDRi17.4Deformable TCN + Dynamic Mixing
Speech SeparationWSJ0-2mixSI-SDRi17.2Deformable TCN + Dynamic Mixing
Speech SeparationWSJ0-2mixMACs (G)3.7Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWSJ0-2mixNumber of parameters (M)1.3Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWSJ0-2mixSDRi16.3Deformable TCN + Shared Weights + Dynamic Mixing
Speech SeparationWSJ0-2mixSI-SDRi16.1Deformable TCN + Shared Weights + Dynamic Mixing

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13