TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

2 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

2 dataset results

Lakh MIDI Dataset

The Lakh MIDI dataset is a collection of 176,581 unique MIDI files, 45,129 of which have been matched and aligned to entries in the Million Song Dataset. Its goal is to facilitate large-scale music information retrieval, both symbolic (using the MIDI files alone) and audio content-based (using information extracted from the MIDI files as annotations for the matched audio files). Around 10% of all MIDI files include timestamped lyrics events with lyrics are often transcribed at the word, syllable or character level.

36 papers0 benchmarksAudio, Lyrics, Midi

iKala

The iKala dataset is a singing voice separation dataset that comprises of 252 30-second excerpts sampled from 206 iKala songs (plus 100 hidden excerpts reserved for MIREX data mining contest). The music accompaniment and the singing voice are recorded at the left and right channels respectively. Additionally, the human-labeled pitch contours and timestamped lyrics are provided.

20 papers1 benchmarksAudio, Lyrics