Datasets

48 machine learning datasets

48 dataset results

Giantsteps

Giantsteps is a dataset that includes songs in major and minor scales for all pitch classes, i.e., a 24-way classification task.

3 papers0 benchmarksMusic

JS Fake Chorales

A MIDI dataset of 500 4-part chorales generated by the KS_Chorus algorithm, annotated with results from hundreds of listening test participants, with 500 further unannotated chorales.

3 papers0 benchmarksMidi, Music, Tabular

IMEMNET (Image-MusicEmotion-Matching-Net)

The Image-MusicEmotion-Matching-Net (IMEMNet) dataset is a dataset for continuous emotion-based image and music matching. It has over 140K image-music pairs.

2 papers0 benchmarksImages, Music

Fingerprint Dataset (Neural Audio Fingerprint Dataset)

This dataset includes all music sources, background noises and impulse-reponses (IR) samples and conversation speech that have been used in the work "Neural Audio Fingerprint for High-specific Audio Retrieval based on Contrastive Learning" ICASSP 2021 (https://arxiv.org/abs/2010.11910).

2 papers0 benchmarksAudio, Music, Speech

Emomusic (Emotion in Music Database)

1000 songs has been selected from Free Music Archive (FMA). The excerpts which were annotated are available in the same package song ids 1 to 1000. Some redundancies were identified, which reduced the dataset down to 744 songs. The dataset is split between the development set (619 songs) and the evaluation set (125 songs). The extracted 45 seconds excerpts are all re-encoded to have the same sampling frequency, i.e, 44100Hz.

2 papers2 benchmarksAudio, Music

Lyra Dataset (A Dataset for Greek Traditional and Folk Music)

Lyra is a dataset of 1570 traditional and folk Greek music pieces that includes audio and video (timestamps and links to YouTube videos), along with annotations that describe aspects of particular interest for this dataset, including instrumentation, geographic information and labels of genre and subgenre, among others.

2 papers0 benchmarksAudio, Music, Videos

YouTube8M-MusicTextClips

The YouTube8M-MusicTextClips dataset consists of over 4k high-quality human text descriptions of music found in video clips from the YouTube8M dataset.

2 papers0 benchmarksAudio, Music, Texts, Videos

Jam-ALT (JamALT: A Formatting-Aware Lyrics Transcription Benchmark)

JamALT is a revision of the JamendoLyrics dataset (80 songs in 4 languages), adapted for use as an automatic lyrics transcription (ALT) benchmark.

2 papers7 benchmarksAudio, Music, Speech, Texts

SynthSOD

The SynthSOD dataset contains more than 47 hours of multitrack music obtained by synthesizing orchestra and ensemble pieces from the Symbolic Orchestral Database (SOD) using Spitfire BBC Symphony Orchestra Professional Library. To synthesize the MIDI files from the SOD, we needed to fix the original files into the General MIDI standard, select a subsect of files that fitted into our requirements (e.g., containing only instruments that we could synthesize), and develop a new system to generate musically-motivated random annotations about tempo, dynamic, and articulation.

2 papers0 benchmarksAudio, Music

AIME (AI Music Evaluation Dataset)

The AIME dataset contains 6,000 audio tracks generated by 12 music generation models in addition to 500 tracks from MTG-Jamendo. The prompts used to generate music are combinations of representative and diverse tags from the MTG-Jamendo dataset.

2 papers0 benchmarksAudio, Music

Mid-level perceptual musical features

This dataset contains annotations for 5000 music files on the following music properties:

1 papers0 benchmarksMusic

Dizi

Dizi is a dataset of music style of the Northern school and the Southern School. Characteristics include melody and playing techniques of the two different music styles are deconstructed.

1 papers0 benchmarksMusic

ChMusic

ChMusic is a traditional Chinese music dataset for training model and performance evaluation of musical instrument recognition. This dataset cover 11 musical instruments, consisting of Erhu, Pipa, Sanxian, Dizi, Suona, Zhuiqin, Zhongruan, Liuqin, Guzheng, Yangqin and Sheng.

1 papers0 benchmarksAudio, Music

AVASpeech-SMAD (AVASpeech-SMAD: A Strongly Labelled Speech and Music Activity Detection Dataset with Label Co-Occurrence)

We propose a dataset, AVASpeech-SMAD, to assist speech and music activity detection research. With frame-level music labels, the proposed dataset extends the existing AVASpeech dataset, which originally consists of 45 hours of audio and speech activity labels. To the best of our knowledge, the proposed AVASpeech-SMAD is the first open-source dataset that features strong polyphonic labels for both music and speech. The dataset was manually annotated and verified via an iterative cross-checking process. A simple automatic examination was also implemented to further improve the quality of the labels. Evaluation results from two state-of-the-art SMAD systems are also provided as a benchmark for future reference.

1 papers0 benchmarksAudio, Music, Speech

MuVi (MusicVideos)

A dataset of music videos with continuous valence/arousal ratings as well as emotion tags.

1 papers0 benchmarksMusic, Videos

Nlakh

Nlakh is a dataset for Musical Instrument Retrieval. It is a combination of the NSynth dataset, which provides a large number of instruments, and the Lakh dataset, which provides multi-track MIDI data.

1 papers0 benchmarksAudio, Music

YM2413-MDB

YM2413-MDB is an 80s FM video game music dataset with multi-label emotion annotations. It includes 669 audio and MIDI files of music from Sega and MSX PC games in the 80s using YM2413, a programmable sound generator based on FM. The collected game music is arranged with a subset of 15 monophonic instruments and one drum instrument.

1 papers0 benchmarksMusic

Virtuoso Strings

Virtuoso Strings is a dataset for soft onsets detection for string instruments. It consists of over 144 recordings of professional performances of an excerpt from Haydn's string quartet Op. 74 No. 1 Finale, each with corresponding individual instrumental onset annotations.

1 papers0 benchmarksAudio, Music

Haydn Annotation Dataset

The Haydn Annotation Dataset consists of note onset annotations from 24 experiment participants with varying musical experience. The annotation experiments use recordings from the ARME Virtuoso Strings Dataset.

1 papers0 benchmarksAudio, Music

jaCappella

jaCappella is a corpus of Japanese a cappella vocal ensembles (jaCappella corpus) for vocal ensemble separation and synthesis. It consists of 35 copyright-cleared vocal ensemble songs and their audio recordings of individual voice parts. These songs were arranged from out-of-copyright Japanese children's songs and have six voice parts (lead vocal, soprano, alto, tenor, bass, and vocal percussion). They are divided into seven subsets, each of which features typical characteristics of a music genre such as jazz and enka.

1 papers0 benchmarksMusic

PreviousPage 2 of 3Next