Papers With Code 2 | ML Benchmarks, SotA Results & Code

The MidiCaps dataset [1] is a large-scale dataset of 168,385 midi music files with descriptive text captions, and a set of extracted musical features.

The captions have been produced through a captioning pipeline incorporating MIR feature extraction and LLM Claude 3 to caption the data from extracted features with an in-context learning task. The framework used to extract the captions is available open source on github. The original MIDI files originate from the Lakh MIDI Dataset [2,3] and are creative commons licenced.