TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SPGM: Prioritizing Local Features for enhanced speech sepa...

SPGM: Prioritizing Local Features for enhanced speech separation performance

Jia Qi Yip, Shengkui Zhao, Yukun Ma, Chongjia Ni, Chong Zhang, Hao Wang, Trung Hieu Nguyen, Kun Zhou, Dianwen Ng, Eng Siong Chng, Bin Ma

2023-09-22Speech Separation
PaperPDFCode(official)

Abstract

Dual-path is a popular architecture for speech separation models (e.g. Sepformer) which splits long sequences into overlapping chunks for its intra- and inter-blocks that separately model intra-chunk local features and inter-chunk global relationships. However, it has been found that inter-blocks, which comprise half a dual-path model's parameters, contribute minimally to performance. Thus, we propose the Single-Path Global Modulation (SPGM) block to replace inter-blocks. SPGM is named after its structure consisting of a parameter-free global pooling module followed by a modulation module comprising only 2% of the model's total parameters. The SPGM block allows all transformer layers in the model to be dedicated to local feature modelling, making the overall model single-path. SPGM achieves 22.1 dB SI-SDRi on WSJ0-2Mix and 20.4 dB SI-SDRi on Libri2Mix, exceeding the performance of Sepformer by 0.5 dB and 0.3 dB respectively and matches the performance of recent SOTA models with up to 8 times fewer parameters. Model and weights are available at huggingface.co/yipjiaqi/spgm

Results

TaskDatasetMetricValueModel
Speech SeparationWSJ0-2mixMACs (G)77SPGM + DM
Speech SeparationWSJ0-2mixNumber of parameters (M)26.2SPGM + DM
Speech SeparationWSJ0-2mixSI-SDRi22.7SPGM + DM
Speech SeparationWSJ0-2mixMACs (G)77SPGM
Speech SeparationWSJ0-2mixNumber of parameters (M)26.2SPGM
Speech SeparationWSJ0-2mixSI-SDRi22.1SPGM

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13