TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Wave-U-Net: A Multi-Scale Neural Network for End-to-End Au...

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Daniel Stoller, Sebastian Ewert, Simon Dixon

2018-06-08Audio Source SeparationMusic Source Separation
PaperPDFCodeCodeCodeCodeCodeCodeCodeCode(official)CodeCode

Abstract

Models for audio source separation usually operate on the magnitude spectrum, which ignores phase information and makes separation performance dependant on hyper-parameters for the spectral front-end. Therefore, we investigate end-to-end source separation in the time-domain, which allows modelling phase information and avoids fixed spectral transformations. Due to high sampling rates for audio, employing a long temporal input context on the sample level is difficult, but required for high quality separation results because of long-range temporal correlations. In this context, we propose the Wave-U-Net, an adaptation of the U-Net to the one-dimensional time domain, which repeatedly resamples feature maps to compute and combine features at different time scales. We introduce further architectural improvements, including an output layer that enforces source additivity, an upsampling technique and a context-aware prediction framework to reduce output artifacts. Experiments for singing voice separation indicate that our architecture yields a performance comparable to a state-of-the-art spectrogram-based U-Net architecture, given the same data. Finally, we reveal a problem with outliers in the currently used SDR evaluation metrics and suggest reporting rank-based statistics to alleviate this problem.

Results

TaskDatasetMetricValueModel
Music Source SeparationMUSDB18SDR (avg)3.23STL2
Music Source SeparationMUSDB18SDR (bass)3.21STL2
Music Source SeparationMUSDB18SDR (drums)4.22STL2
Music Source SeparationMUSDB18SDR (other)2.25STL2
Music Source SeparationMUSDB18SDR (vocals)3.25STL2
2D ClassificationMUSDB18SDR (avg)3.23STL2
2D ClassificationMUSDB18SDR (bass)3.21STL2
2D ClassificationMUSDB18SDR (drums)4.22STL2
2D ClassificationMUSDB18SDR (other)2.25STL2
2D ClassificationMUSDB18SDR (vocals)3.25STL2

Related Papers

Towards Reliable Objective Evaluation Metrics for Generative Singing Voice Separation Models2025-07-15DGMO: Training-Free Audio Source Separation through Diffusion-Guided Mask Optimization2025-06-03ZeroSep: Separate Anything in Audio with Zero Training2025-05-29Text-Queried Audio Source Separation via Hierarchical Modeling2025-05-27Music Source Restoration2025-05-27Training-Free Multi-Step Audio Source Separation2025-05-26Is MixIT Really Unsuitable for Correlated Sources? Exploring MixIT for Unsupervised Pre-training in Music Source Separation2025-05-12Score Distillation Sampling for Audio: Source Separation, Synthesis, and Beyond2025-05-07