David Samuel, Aditya Ganeshan, Jason Naradowsky
We propose a hierarchical meta-learning-inspired model for music source separation (Meta-TasNet) in which a generator model is used to predict the weights of individual extractor models. This enables efficient parameter-sharing, while still allowing for instrument-specific parameterization. Meta-TasNet is shown to be more effective than the models trained independently or in a multi-task setting, and achieve performance comparable with state-of-the-art methods. In comparison to the latter, our extractors contain fewer parameters and have faster run-time performance. We discuss important architectural considerations, and explore the costs and benefits of this approach.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Music Source Separation | MUSDB18 | SDR (avg) | 5.52 | Meta-TasNet |
| Music Source Separation | MUSDB18 | SDR (bass) | 5.58 | Meta-TasNet |
| Music Source Separation | MUSDB18 | SDR (drums) | 5.91 | Meta-TasNet |
| Music Source Separation | MUSDB18 | SDR (other) | 4.19 | Meta-TasNet |
| Music Source Separation | MUSDB18 | SDR (vocals) | 6.4 | Meta-TasNet |
| 2D Classification | MUSDB18 | SDR (avg) | 5.52 | Meta-TasNet |
| 2D Classification | MUSDB18 | SDR (bass) | 5.58 | Meta-TasNet |
| 2D Classification | MUSDB18 | SDR (drums) | 5.91 | Meta-TasNet |
| 2D Classification | MUSDB18 | SDR (other) | 4.19 | Meta-TasNet |
| 2D Classification | MUSDB18 | SDR (vocals) | 6.4 | Meta-TasNet |