Zhesong Yu, Xiaoshuo Xu, Xiaoou Chen, Deshun Yang
Cover song identification represents a challenging task in the field of Music Information Retrieval (MIR) due to complex musical variations between query tracks and cover versions. Previous works typically utilize hand-crafted features and alignment algorithms for the task. More recently, further breakthroughs are achieved employing neural network approaches. In this paper, we propose a novel Convolutional Neural Network (CNN) architecture based on the characteristics of the cover song task. We first train the network through classification strategies; the network is then used to extract music representation for cover song identification. A scheme is designed to train robust models against tempo changes. Experimental results show that our approach outperforms state-of-the-art methods on all public datasets, improving the performance especially on the large dataset.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Cover song identification | SHS100K-TEST | mAP | 0.655 | CQT-Net |
| Cover song identification | Covers80 | MAP | 0.84 | CQT-Net |
| Cover song identification | YouTube350 | MAP | 0.917 | CQT-Net |