W. Zai El Amri, O. Tautz, H. Ritter, A. Melnik
In this work, we demonstrate how a publicly available, pre-trained Jukebox model can be adapted for the problem of audio source separation from a single mixed audio channel. Our neural network architecture, which is using transfer learning, is quick to train and the results demonstrate performance comparable to other state-of-the-art approaches that require a lot more compute resources, training data, and time. We provide an open-source code implementation of our architecture (https://github.com/wzaielamri/unmix)
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Music Source Separation | MUSDB18-HQ | SDR (avg) | 4.188 | Unmix |
| Music Source Separation | MUSDB18-HQ | SDR (bass) | 4.073 | Unmix |
| Music Source Separation | MUSDB18-HQ | SDR (drums) | 4.925 | Unmix |
| Music Source Separation | MUSDB18-HQ | SDR (others) | 2.695 | Unmix |
| Music Source Separation | MUSDB18-HQ | SDR (vocals) | 5.06 | Unmix |
| 2D Classification | MUSDB18-HQ | SDR (avg) | 4.188 | Unmix |
| 2D Classification | MUSDB18-HQ | SDR (bass) | 4.073 | Unmix |
| 2D Classification | MUSDB18-HQ | SDR (drums) | 4.925 | Unmix |
| 2D Classification | MUSDB18-HQ | SDR (others) | 2.695 | Unmix |
| 2D Classification | MUSDB18-HQ | SDR (vocals) | 5.06 | Unmix |