TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Voice Separation with an Unknown Number of Multiple Speakers

Voice Separation with an Unknown Number of Multiple Speakers

Eliya Nachmani, Yossi Adi, Lior Wolf

2020-02-29ICML 2020 1Speech Separation
PaperPDFCodeCodeCodeCode(official)

Abstract

We present a new method for separating a mixed audio sequence, in which multiple voices speak simultaneously. The new method employs gated neural networks that are trained to separate the voices at multiple processing steps, while maintaining the speaker in each output channel fixed. A different model is trained for every number of possible speakers, and the model with the largest number of speakers is employed to select the actual number of speakers in a given sample. Our method greatly outperforms the current state of the art, which, as we show, is not competitive for more than two speakers.

Results

TaskDatasetMetricValueModel
Speech SeparationWHAMR!SI-SDRi12.2VSUNOS
Speech SeparationWSJ0-5mixSI-SDRi10.56Gated DualPathRNN
Speech SeparationWSJ0-2mixSI-SDRi20.12Gated DualPathRNN
Speech SeparationWSJ0-3mixSI-SDRi16.85Gated DualPathRNN
Speech SeparationWSJ0-4mixSI-SDRi12.88Gated DualPathRNN

Related Papers

Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25Attractor-Based Speech Separation of Multiple Utterances by Unknown Number of Speakers2025-05-22Single-Channel Target Speech Extraction Utilizing Distance and Room Clues2025-05-20Time-Frequency-Based Attention Cache Memory Model for Real-Time Speech Separation2025-05-19SepPrune: Structured Pruning for Efficient Deep Speech Separation2025-05-17A Survey of Deep Learning for Complex Speech Spectrograms2025-05-13