TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-Decoder DPRNN: High Accuracy Source Counting and Sep...

Multi-Decoder DPRNN: High Accuracy Source Counting and Separation

Junzhe Zhu, Raymond Yeh, Mark Hasegawa-Johnson

2020-11-24Vocal Bursts Intensity PredictionSpeech Separation
PaperPDFCode(official)Code

Abstract

We propose an end-to-end trainable approach to single-channel speech separation with unknown number of speakers. Our approach extends the MulCat source separation backbone with additional output heads: a count-head to infer the number of speakers, and decoder-heads for reconstructing the original signals. Beyond the model, we also propose a metric on how to evaluate source separation with variable number of speakers. Specifically, we cleared up the issue on how to evaluate the quality when the ground-truth hasmore or less speakers than the ones predicted by the model. We evaluate our approach on the WSJ0-mix datasets, with mixtures up to five speakers. We demonstrate that our approach outperforms state-of-the-art in counting the number of speakers and remains competitive in quality of reconstructed signals.

Results

TaskDatasetMetricValueModel
Speech SeparationWSJ0-5mixSI-SDRi5.9Multi-Decoder DPRNN
Speech SeparationWSJ0-4mixSI-SDRi9.3Multi-Decoder DPRNN

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13