TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/On Time Domain Conformer Models for Monaural Speech Separa...

On Time Domain Conformer Models for Monaural Speech Separation in Noisy Reverberant Acoustic Environments

William Ravenscroft, Stefan Goetze, Thomas Hain

2023-10-09Speech Separation
PaperPDFCode(official)

Abstract

Speech separation remains an important topic for multi-speaker technology researchers. Convolution augmented transformers (conformers) have performed well for many speech processing tasks but have been under-researched for speech separation. Most recent state-of-the-art (SOTA) separation models have been time-domain audio separation networks (TasNets). A number of successful models have made use of dual-path (DP) networks which sequentially process local and global information. Time domain conformers (TD-Conformers) are an analogue of the DP approach in that they also process local and global context sequentially but have a different time complexity function. It is shown that for realistic shorter signal lengths, conformers are more efficient when controlling for feature dimension. Subsampling layers are proposed to further improve computational efficiency. The best TD-Conformer achieves 14.6 dB and 21.2 dB SISDR improvement on the WHAMR and WSJ0-2Mix benchmarks, respectively.

Results

TaskDatasetMetricValueModel
Speech SeparationWHAMR!SI-SDRi14.6TD-Conformer (XL) + DM
Speech SeparationWHAMR!SI-SDRi13.4TD-Conformer (L) + DM
Speech SeparationWHAMR!SI-SDRi12TD-Confomer (M) + DM
Speech SeparationWHAMR!SI-SDRi10.5TD-Confomer (S)
Speech SeparationWSJ0-2mixSI-SDRi21.2TD-Conformer (XL) + DM

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13