TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Music Source Separation in the Waveform Domain

Music Source Separation in the Waveform Domain

Alexandre Défossez, Nicolas Usunier, Léon Bottou, Francis Bach

2019-11-27Audio GenerationQuantizationData AugmentationMulti-task Audio Source SeperationMusic Source SeparationAudio Synthesis
PaperPDFCode(official)

Abstract

Source separation for music is the task of isolating contributions, or stems, from different instruments recorded individually and arranged together to form a song. Such components include voice, bass, drums and any other accompaniments.Contrarily to many audio synthesis tasks where the best performances are achieved by models that directly generate the waveform, the state-of-the-art in source separation for music is to compute masks on the magnitude spectrum. In this paper, we compare two waveform domain architectures. We first adapt Conv-Tasnet, initially developed for speech source separation,to the task of music source separation. While Conv-Tasnet beats many existing spectrogram-domain methods, it suffersfrom significant artifacts, as shown by human evaluations. We propose instead Demucs, a novel waveform-to-waveform model,with a U-Net structure and bidirectional LSTM.Experiments on the MusDB dataset show that, with proper data augmentation, Demucs beats allexisting state-of-the-art architectures, including Conv-Tasnet, with 6.3 SDR on average, (and up to 6.8 with 150 extra training songs, even surpassing the IRM oracle for the bass source).Using recent development in model quantization, Demucs can be compressed down to 120MBwithout any loss of accuracy.We also provide human evaluations, showing that Demucs benefit from a large advantagein terms of the naturalness of the audio. However, it suffers from some bleeding,especially between the vocals and other source.

Results

TaskDatasetMetricValueModel
Music Source SeparationMUSDB18SDR (avg)6.79DEMUCS (extra)
Music Source SeparationMUSDB18SDR (bass)7.6DEMUCS (extra)
Music Source SeparationMUSDB18SDR (drums)7.58DEMUCS (extra)
Music Source SeparationMUSDB18SDR (other)4.69DEMUCS (extra)
Music Source SeparationMUSDB18SDR (vocals)7.29DEMUCS (extra)
Music Source SeparationMUSDB18SDR (avg)6.28DEMUCS
Music Source SeparationMUSDB18SDR (bass)7.01DEMUCS
Music Source SeparationMUSDB18SDR (drums)6.86DEMUCS
Music Source SeparationMUSDB18SDR (other)4.42DEMUCS
Music Source SeparationMUSDB18SDR (vocals)6.84DEMUCS
2D ClassificationMUSDB18SDR (avg)6.79DEMUCS (extra)
2D ClassificationMUSDB18SDR (bass)7.6DEMUCS (extra)
2D ClassificationMUSDB18SDR (drums)7.58DEMUCS (extra)
2D ClassificationMUSDB18SDR (other)4.69DEMUCS (extra)
2D ClassificationMUSDB18SDR (vocals)7.29DEMUCS (extra)
2D ClassificationMUSDB18SDR (avg)6.28DEMUCS
2D ClassificationMUSDB18SDR (bass)7.01DEMUCS
2D ClassificationMUSDB18SDR (drums)6.86DEMUCS
2D ClassificationMUSDB18SDR (other)4.42DEMUCS
2D ClassificationMUSDB18SDR (vocals)6.84DEMUCS

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Quantized Rank Reduction: A Communications-Efficient Federated Learning Scheme for Network-Critical Applications2025-07-15