Datasets

486 machine learning datasets

486 dataset results

Candombe (Candombe Recordings Dataset)

35 recordings of Candombe music with beat and downbeat annotations.

Hainsworth

S. W. Hainsworth and M. D. Macleod, “Particle filtering applied to musical tempo tracking,” EURASIP Journal on Advances in Signal Processing, vol. 2004, pp. 1–11, 2004

1 papers2 benchmarksAudio

Harmonix (The Harmonix Set)

Beats, downbeats, and functional structural annotations for 912 Pop tracks.

1 papers2 benchmarksAudio

J. Hockman, M. E. Davies, and I. Fujinaga, “One in the jungle: Downbeat detection in hardcore, jungle, and drum and bass.” in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2012.

1 papers2 benchmarksAudio

JAAH (Jazz Audio-Aligned Harmony)

Eremenko, E. Demirel, B. Bozkurt, and X. Serra, “Audio-aligned jazz harmony dataset for automatic chord transcription and corpus-based research,” in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2018

1 papers2 benchmarksAudio

SIMAC

F. Gouyon, “A computational approach to rhythm description — Audio features for the computation of rhythm periodicity functions and their use in tempo induction and music content processing,” Ph.D. dissertation, Universitat Pompeu Fabra, 2006

1 papers1 benchmarksAudio

SMC

A. Holzapfel, M. E. Davies, J. R. Zapata, J. L. Oliveira, and F. Gouyon, “Selective sampling for beat tracking evaluation,” Transactions on Audio, Speech, and Language Processing, vol. 20, no. 9, pp. 2539–2548, 2012

1 papers1 benchmarksAudio

TapCorrect

J. Driedger, H. Schreiber, W. B. de Haas, and M. Müller, “Towards automatically correcting tapped beat annotations for music recordings.” in Proceedings of the International Society for Music Information Retrieval Conference (ISMIR), 2019

1 papers2 benchmarksAudio

AViMoS (Audio-Visual Mouse Saliency)

A novel audio-visual mouse saliency (AViMoS) dataset with the following key-features:

1 papers0 benchmarksAudio, Time series, Tracking, Videos

Speech Robust Bench

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksAudio

Biodenoising_validation

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksAudio

CAS-VSR-S101

A new large-scale, in-thewild Mandarin dataset, CAS-VSR-S101 with 101.1 hours of data. The videos are sourced from broadcast news and conversational programs in Chinese, covering a highly diverse set of topics, speakers and filming conditions. The lengths of the utterances are naturally distributed between 0.01s and 10.57s, and image qualities and resolutions vary. News accounts for 82.4% of the programs. 70.4% of the utterances depict news anchors, hosts and correspondents, while 29.6% are those of interviewees and guests. In addition, at a ratio of approximately 1.5 : 1, male and female appearances are relatively balanced. It is divided into train, validation and test sets by TV channels to minimize speaker overlap, and at a ratio of roughly 8 : 1 : 1.5 in terms of duration. The validation and test sets are composed of programs broadcast on provincial TV channels. The dataset is available for academic use under a license.

1 papers4 benchmarksAudio, Speech, Texts, Videos

MAHNOB-HCI (MAHNOB-HCI-Tagging database)

Characterising multimedia content with relevant, reliable and discriminating tags is vital for multimedia information retrieval. With the rapid expansion of digital multimedia content, alternative methods to the existing explicit tagging are needed to enrich the pool of tagged content. Currently, social media websites encourage users to tag their content. However, the users’ intent when tagging multimedia content does not always match the information retrieval goals. A large portion of user defined tags are either motivated by increasing the popularity and reputation of a user in an online com-munity or based on individual and egoistic judgments. Moreover, users do not evaluate media content on the same criteria. Some might tag multimedia content with words to express their emotion while others might use tags to describe the content. For example, a picture receive different tags based on the objects in the image, the camera by which the picture was taken or the emotion a user felt look

1 papers0 benchmarksAudio, EEG, Videos

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

1 papers0 benchmarksAudio, Midi, Texts

CUCO Database (A voice and speech corpus of patients who underwent upper airway surgery in pre-and post-operative states)

Many research articles have explored the impact of surgical interventions on voice and speech evaluations, but advances are limited by the lack of publicly accessible datasets. To address this, a comprehensive corpus of 107 Spanish Castilian speakers was recorded, including control speakers and patients who underwent upper airway surgeries such as Tonsillectomy, Functional Endoscopic Sinus Surgery, and Septoplasty. The dataset contains 3,800 audio files, averaging 35.51 ± 5.91 recordings per patient. This resource enables systematic investigation of the effects of upper respiratory tract surgery on voice and speech. Previous studies using this corpus have shown no relevant changes in key acoustic parameters for sustained vowel phonation, consistent with initial hypotheses. However, the analysis of speech recordings, particularly nasalised segments, remains open for further research. Additionally, this dataset facilitates the study of the impact of upper airway surgery on speaker recogn

1 papers0 benchmarksAudio, Speech

AutoTherm

Temporal Dataset for Indoor and In-Vehicle Thermal Comfort Estimation Abstract Thermal comfort estimation is essential for enhancing user experience in static indoor environments and dynamic in-vehicle scenarios. While traditional datasets focus on buildings, their application to fast-changing conditions, such as in vehicles, remains unexplored. We address this gap by introducing two temporal datasets collected from (1) a self-built climatic chamber with 31 sensor signals and user-labeled ratings from 18 participants and (2) in-vehicle studies with 20 participants in a BMW 3 Series.

1 papers0 benchmarksAudio, EEG, Images, Time series, Tracking

United-Syn-Med

The United-Syn-Med dataset is a specialized medical speech dataset designed to evaluate and improve Automatic Speech Recognition (ASR) systems within the healthcare domain. It comprises English medical speech recordings, with a particular focus on medical terminology and clinical conversations. The dataset is well-suited for various ASR tasks, including speech recognition, transcription, and classification, facilitating the development of models tailored for medical contexts.

1 papers0 benchmarksAudio, Speech

PC-GITA

PC-GITA is a Spanish speech corpus designed to analyze speech impairments in individuals with Parkinson's Disease (PD).

1 papers0 benchmarksAudio

Guitar-TECHS (Guitar Tones/Techniques, Excerpts & Chords Dataset)

Guitar-TECHS is a comprehensive dataset featuring a variety of guitar techniques, musical excerpts, chords, and scales. These elements are performed by diverse musicians across various recording settings. Guitar-TECHS incorporates recordings from two stereo microphones: an egocentric microphone positioned on the performer’s head and an exocentric microphone placed in front of the performer. It also includes direct input recordings and microphoned amplifier outputs, offering a wide spectrum of audio inputs and recording qualities. All signals and MIDI labels are properly synchronized. Its multi-perspective and multi-modal content makes Guitar-TECHS a valuable resource for advancing data-driven guitar research, and to develop robust guitar listening algorithms.

1 papers0 benchmarksAudio, Midi

Sound of Water 50

We collect a dataset of 805 clean videos that show the action of pouring water in a container. Our dataset spans over 50 unique containers made of 5 different materials, 4 different shapes and with hot and cold water.

1 papers1 benchmarksAudio, Videos

PreviousPage 21 of 25Next

Datasets

Candombe (Candombe Recordings Dataset)

Hainsworth

Harmonix (The Harmonix Set)

HJDB

JAAH (Jazz Audio-Aligned Harmony)

SIMAC

SMC

TapCorrect

AViMoS (Audio-Visual Mouse Saliency)

Speech Robust Bench

Biodenoising_validation

CAS-VSR-S101

MAHNOB-HCI (MAHNOB-HCI-Tagging database)

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

CUCO Database (A voice and speech corpus of patients who underwent upper airway surgery in pre-and post-operative states)

AutoTherm

United-Syn-Med

PC-GITA

Guitar-TECHS (Guitar Tones/Techniques, Excerpts & Chords Dataset)

Sound of Water 50

Datasets

Candombe (Candombe Recordings Dataset)

Hainsworth

Harmonix (The Harmonix Set)

HJDB

JAAH (Jazz Audio-Aligned Harmony)

SIMAC

SMC

TapCorrect

AViMoS (Audio-Visual Mouse Saliency)

Speech Robust Bench

Biodenoising_validation

CAS-VSR-S101

MAHNOB-HCI (MAHNOB-HCI-Tagging database)

PIAST (PIAST: A Multimodal Piano Dataset with Audio, Symbolic and Text)

CUCO Database (A voice and speech corpus of patients who underwent upper airway surgery in pre-and post-operative states)

AutoTherm

United-Syn-Med

PC-GITA

Guitar-TECHS (Guitar Tones/Techniques, Excerpts & Chords Dataset)

Sound of Water 50