SepIt: Approaching a Single Channel Speech Separation Bound

Shahar Lutati, Eliya Nachmani, Lior Wolf

2022-05-24Audio Source Separation Speech Separation

Abstract

We present an upper bound for the Single Channel Speech Separation task, which is based on an assumption regarding the nature of short segments of speech. Using the bound, we are able to show that while the recent methods have made significant progress for a few speakers, there is room for improvement for five and ten speakers. We then introduce a Deep neural network, SepIt, that iteratively improves the different speakers' estimation. At test time, SpeIt has a varying number of iterations per test sample, based on a mutual information criterion that arises from our analysis. In an extensive set of experiments, SepIt outperforms the state-of-the-art neural networks for 2, 3, 5, and 10 speakers.

Results

Task	Dataset	Metric	Value	Model
Speech Separation	WSJ0-2mix	SI-SDRi	22.4	SepIt
Speech Separation	WSJ0-3mix	SI-SDRi	20.1	SepIt
Speech Separation	Libri5Mix	SI-SDRi	13.7	SepIt
Speech Separation	Libri10Mix	SI-SDRi	8.2	SepIt

Related Papers

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models2025-07-15 Dynamic Slimmable Networks for Efficient Speech Separation2025-07-08 Improving Practical Aspects of End-to-End Multi-Talker Speech Recognition for Online and Offline Scenarios2025-06-17 DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization2025-06-03 ZeroSep: Separate Anything in Audio with Zero Training2025-05-29 Text-Queried Audio Source Separation via Hierarchical Modeling2025-05-27 Training-Free Multi-Step Audio Source Separation2025-05-26 SoloSpeech: Enhancing Intelligibility and Quality in Target Speech Extraction through a Cascaded Generative Pipeline2025-05-25