Datasets

486 machine learning datasets

486 dataset results

Artie Bias Corpus

Artie Bias Corpus is an open dataset for detecting demographic bias in speech applications.

The dataset uses VGG-Sound which consists of 10s clips collected from YouTube for 309 sound classes. A subset of ‘temporally sparse’ classes is selected using the following procedure: 5–15 videos are randomly picked from each of the 309 VGGSound classes, and manually annotated as to whether audio-visual cues are only sparsely available. As a result, 12 classes are selected (∼4 %) or 6.5k and 0.6k videos in the train and test sets, respectively. The classes include 'dog barking', 'chopping wood', 'lion roaring', 'skateboarding' etc.

5 papers0 benchmarksActions, Audio, Images, Videos

BRACE (The Breakdancing Competition Dataset for Dance Motion Synthesis)

BRACE is a dataset for audio-conditioned dance motion synthesis challenging common assumptions for this task:

5 papers30 benchmarksActions, Audio, Point cloud, Videos

Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ)

The Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ) dataset [24, 25] comprises voice and text samples from 189 interviewed healthy and control persons and their PHQ-8 depression detection questionnaire. This dataset is commonly used in research works for text-based detection, voice-based detection, and in multi-modal architecture

5 papers0 benchmarksAudio, Texts, Videos

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

BIWI 3D corpus comprises a total of 1109 sentences uttered by 14 native English speakers (6 males and 8 females). A real time 3D scanner and a professional microphone were used to capture the facial movements and the speech of the speakers. The dense dynamic face scans were acquired at 25 frames per second and the RMS error in the 3D reconstruction is about 0.5 mm. In order to ease automatic speech segmentation, we carried out the recordings in a anechoic room, with walls covered by sound wave-absorbing materials.

5 papers14 benchmarks3d meshes, Audio, Texts

Song Describer Dataset

The Song Describer Dataset (SDD) contains ~1.1k captions for 706 permissively licensed music recordings. It is designed for use in evaluation of models that address music-and-language (M&L) tasks such as music captioning, text-to-music generation and music-language retrieval.

5 papers2 benchmarksAudio, Music, Texts

Blizzard Challenge 2013 (Blizzard Challenge 2013 - English language tasks)

The English data for voice building was obtained, prepared and provided the the challenge by Lessac Technologies Inc., having originally came from the publishers Voice Factory International Inc. It comprises speech from one female professional narrator & actress, Catherine ‘Bobbie’ Byers, reading the text of a collection of classic novels. These had been divided by the publishers of the original audiobooks into a number of genres, such as “Classic Novels”, “Women’s Classics”, “Young Readers” and so on.

5 papers3 benchmarksAudio

RealMAN (A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization)

The Audio Signal and Information Processing Lab at Westlake University, in collaboration with AISHELL, has released the Real-recorded and annotated Microphone Array speech&Noise (RealMAN) dataset, which provides annotated multi-channel speech and noise recordings for dynamic speech enhancement and localization:

5 papers7 benchmarksAudio, Speech

Nottingham

The Nottingham Dataset is a collection of 1200 American and British folk songs.

4 papers2 benchmarksAudio

GoodSounds

GoodSounds dataset contains around 28 hours of recordings of single notes and scales played by 15 different professional musicians, all of them holding a music degree and having some expertise in teaching. 12 different instruments (flute, cello, clarinet, trumpet, violin, alto sax alto, tenor sax, baritone sax, soprano sax, oboe, piccolo and bass) were recorded using one or up to 4 different microphones. For all the instruments the whole set of playable semitones in the instrument is recorded several times with different tonal characteristics. Each note is recorded into a separate monophonic audio file of 48kHz and 32 bits. Rich annotations of the recordings are available, including details on recording environment and rating on tonal qualities of the sound (“good-sound”, “bad”, “scale-good”, “scale-bad”).

4 papers0 benchmarksAudio

Bach Doodle

The Bach Doodle Dataset is composed of 21.6 million harmonizations submitted from the Bach Doodle. The dataset contains both metadata about the composition (such as the country of origin and feedback), as well as a MIDI of the user-entered melody and a MIDI of the generated harmonization. The dataset contains about 6 years of user entered music.

4 papers0 benchmarksAudio

NIPS4Bplus

NIPS4Bplus is a richly annotated birdsong audio dataset, that is comprised of recordings containing bird vocalisations along with their active species tags plus the temporal annotations acquired for them. It consists of around 687 recordings, 87 classes, species tags, annotations. The total duration of audio is around 1 hour.

4 papers0 benchmarksAudio

dMelodies

dMelodies is dataset of simple 2-bar melodies generated using 9 independent latent factors of variation where each data point represents a unique melody based on the following constraints: - Each melody will correspond to a unique scale (major, minor, blues, etc.). - Each melody plays the arpeggios using the standard I-IV-V-I cadence chord pattern. - Bar 1 plays the first 2 chords (6 notes), Bar 2 plays the second 2 chords (6 notes). - Each played note is an 8th note.

4 papers0 benchmarksAudio

SONYC-UST-V2

A dataset for urban sound tagging with spatiotemporal information. This dataset is aimed for the development and evaluation of machine listening systems for real-world urban noise monitoring. While datasets of urban recordings are available, this dataset provides the opportunity to investigate how spatiotemporal metadata can aid in the prediction of urban sound tags. SONYC-UST-V2 consists of 18510 audio recordings from the "Sounds of New York City" (SONYC) acoustic sensor network, including the timestamp of audio acquisition and location of the sensor.

4 papers0 benchmarksAudio

MuMu

MuMu is a new dataset of more than 31k albums classified into 250 genre classes.

4 papers0 benchmarksAudio, Images, Texts

Warblr

Warblr is a dataset for the acoustic detection of birds. The dataset comes from a UK bird-sound crowdsourcing research spinout called Warblr. From this initiative the authors collected over 10,000 ten-second smartphone audio recordings from around the UK. The audio totals around 28 hours duration.

4 papers0 benchmarksAudio

TAU Spatial Sound Events 2019

TAU Spatial Sound Events 2019 consists of 2 datasets: Ambisonic (FOA) and Microphone Array (MIC), of identical sound scenes with the only difference in the format of the audio. The FOA dataset provides four-channel First-Order Ambisonic recordings while the MIC dataset provides four-channel directional microphone recordings from a tetrahedral array configuration. Both formats are extracted from the same microphone array.

4 papers0 benchmarksAudio

FINO-Net

FINO-Net is a multimodal (RGB, depth and audio) dataset, containing 229 real-world manipulation data of 5 different manipulation types recorded with a Baxter robot.

4 papers0 benchmarksAudio, RGB-D

Children's Song Dataset

Children's Song Dataset is open source dataset for singing voice research. This dataset contains 50 Korean and 50 English songs sung by one Korean female professional pop singer. Each song is recorded in two separate keys resulting in a total of 200 audio recordings. Each audio recording is paired with a MIDI transcription and lyrics annotations in both grapheme-level and phoneme-level.

4 papers0 benchmarksAudio

TAU-NIGENS Spatial Sound Events 2021

The TAU-NIGENS Spatial Sound Events 2021 dataset contains multiple spatial sound-scene recordings, consisting of sound events of distinct categories integrated into a variety of acoustical spaces, and from multiple source directions and distances as seen from the recording position. The spatialization of all sound events is based on filtering through real spatial room impulse responses (RIRs), captured in multiple rooms of various shapes, sizes, and acoustical absorption properties. Furthermore, each scene recording is delivered in two spatial recording formats, a microphone array one (MIC), and first-order Ambisonics one (FOA). The sound events are spatialized as either stationary sound sources in the room, or moving sound sources, in which case time-variant RIRs are used. Each sound event in the sound scene is associated with a single direction-of-arrival (DoA) if static, a trajectory DoAs if moving, and a temporal onset and offset time. The isolated sound event recordings used for t

4 papers4 benchmarksAudio

PreviousPage 10 of 25Next

Datasets

Artie Bias Corpus

VGGSound-Sparse

BRACE (The Breakdancing Competition Dataset for Dance Motion Synthesis)

Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ)

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

Song Describer Dataset

Blizzard Challenge 2013 (Blizzard Challenge 2013 - English language tasks)

RealMAN (A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization)

Nottingham

GoodSounds

Bach Doodle

NIPS4Bplus

dMelodies

SONYC-UST-V2

MuMu

Warblr

TAU Spatial Sound Events 2019

FINO-Net

Children's Song Dataset

TAU-NIGENS Spatial Sound Events 2021

Datasets

Artie Bias Corpus

VGGSound-Sparse

BRACE (The Breakdancing Competition Dataset for Dance Motion Synthesis)

Distress Analysis Interview Corpus/Wizard-of-Oz set (DAIC-WOZ)

Biwi 3D Audiovisual Corpus of Affective Communication - B3D(AC)^2 (BIWI 3D)

Song Describer Dataset

Blizzard Challenge 2013 (Blizzard Challenge 2013 - English language tasks)

RealMAN (A Real-Recorded and Annotated Microphone Array Dataset for Dynamic Speech Enhancement and Localization)

Nottingham

GoodSounds

Bach Doodle

NIPS4Bplus

dMelodies

SONYC-UST-V2

MuMu

Warblr

TAU Spatial Sound Events 2019

FINO-Net

Children's Song Dataset

TAU-NIGENS Spatial Sound Events 2021