486 machine learning datasets
486 dataset results
The Couples Therapy corpus contains audio, video recordings and manual transcriptions of conversations between 134 real-life couples attending marital therapy. In each session, one person selected a topic that was discussed over 10 minutes with the spouse. At the end of the session, both speakers were rated separately on 33 “behavior codes” by multiple annotators based on the Couples Interaction and Social Support Rating Systems. Each behavior was rated on a Likert scale from 1, indicating absence, to 9, indicating strong presence. A session-level rating was obtained for each speaker by averaging the annotator ratings. This process was repeated for the spouse, resulting in 2 sessions per couple at a time. The total number of sessions per couple varied between 2 and 6.
The TAU Spatial Sound Events 2019 - Ambisonic dataset contains recordings from a scene (along with the Microphone Array sister dataset). It provides four-channel First-Order Ambisonic (FOA) recordings. The recordings consist of stationary point sources from multiple sound classes each associated with a temporal onset and offset time, and DOA coordinate represented using azimuth and elevation angle. The development set consists of 400, one minute long recordings sampled at 48000 Hz, and divided into four cross-validation splits of 100 recordings each. These recordings were synthesized using spatial room impulse response (IRs) collected from five indoor locations, at 504 unique combinations of azimuth-elevation-distance. Furthermore, in order to synthesize the recordings, the collected IRs were convolved with isolated sound events dataset from DCASE 2016 task 2. Finally, to create a realistic sound scene recording, natural ambient noise collected in the IR recording locations was added t
The TAU Spatial Sound Events 2019 – Microphone Array dataset contains recordings from a scene (along with the Ambisonic sister dataset). It provides four-channel directional microphone recordings from a tetrahedral array configuration. The recordings consist of stationary point sources from multiple sound classes each associated with a temporal onset and offset time, and DOA coordinate represented using azimuth and elevation angle. The development set consists of 400, one minute long recordings sampled at 48000 Hz, and divided into four cross-validation splits of 100 recordings each. These recordings were synthesized using spatial room impulse response (IRs) collected from five indoor locations, at 504 unique combinations of azimuth-elevation-distance. Furthermore, in order to synthesize the recordings, the collected IRs were convolved with isolated sound events dataset from DCASE 2016 task 2. Finally, to create a realistic sound scene recording, natural ambient noise collected in the
Bach chorales is a univariate time series based on chorales, where the task is to learn generative grammar. The dataset consists of single-line melodies of 100 Bach chorales (originally 4 voices). The melody line can be studied independently of other voices. The grand challenge is to learn a generative grammar for stylistically valid chorales.
Robbie Williams is a dataset of 65 songs by Robbie Williams. It consists of chords, keys and beats. The dataset does not include audio.
ISMIR2004 is an audio dataset consisting of 6 genres with 729 excerpts of 30 seconds. It is a dataset used for musical genre classification. The training set consists of 320 classical music samples, 115 electronic music samples, 26 jazz blues samples, 45 metal/punk samples, 101 rock/pop samples and 122 world samples.
CLO-43SD is a dataset for multi-class species identification in avian flight calls. It consists of 5,428 labeled audio clips of flight calls from 43 different species of North American woodwarblers (in the family Parulidae). The clips came from a variety of recording conditions, including clean recordings obtained using highly-directional shotgun microphones, recordings obtained from noisier field recordings using omnidirectional microphones, and recordings obtained from birds in captivity.
CLO-WTSP is a dataset for species-specific flight call identification for the White-Throated Sparrow. 16,703 labeled audio clips captured by remote acoustic sensors deployed in Ithaca, NY and NYC over the fall 2014 and spring 2015 migration seasons. Each clip is labeled to indicate whether it contains a flight call from the target species White-Throated Sparrow (WTSP), a flight call from a non-target species, or no flight call at all.
CLO-SWTH is a dataset for species-specific flight call identification for the Swainson’s Thrush. 179,111 labeled audio clips captured by remote acoustic sensors deployed in Ithaca, NY and NYC over the fall 2014 and spring 2015 migration seasons. Each clip is labeled to indicate whether it contains a flight call from the target species Swainson’s Thrush (SWTH), a flight call from a non-target species, or no flight call at all.
Freefield1010 is a collection of 7,690 excerpts from field recordings around the world, gathered by the FreeSound project, and then standardised for research.
warblrb10k is a collection of 10,000 smartphone audio recordings from around the UK, crowdsourced by users of Warblr the bird recognition app. The audio covers a wide distribution of UK locations and environments, and includes weather noise, traffic noise, human speech and even human bird imitations.
Chernobyl is a collection of 620 audio clips collected from unattended remote monitoring equipment in the Chernobyl Exclusion Zone (CEZ). This data was collected as part of the TREE (Transfer-Exposure-Effects) research project into the long-term effects of the Chernobyl accident on local ecology. The audio covers a range of birds and includes weather, large mammal and insect noise sampled across various CEZ environments, including abandoned village, grassland and forest areas.
PolandNFC is a collection of 4,000 recordings from Hanna Pamuła's PhD project of monitoring autumn nocturnal bird migration. The recordings were collected every night, from September to November 2016 on the Baltic Sea coast, Poland, using Song Meter SM2 units with microphones mounted on 3–5 m poles. A subset derived from 15 nights with different weather conditions and background noise including wind, rain, sea noise, insect calls, human voice and deer calls was used in DCASE 2018 Challenge.
The BirdVox-DCASE-20k dataset contains 20,000 ten-second audio recordings. These recordings come from ROBIN autonomous recording units, placed near Ithaca, NY, USA during the fall 2015. They were captured on the night of September 23rd, 2015, by six different sensors, originally numbered 1, 2, 3, 5, 7, and 10. Out of these 20,000 recording, 10,017 (50.09%) contain at least one bird vocalization (either song, call, or chatter). The dataset is a derivative work of the BirdVox-full-night dataset, containing almost as much data but formatted into ten-second excerpts rather than ten-hour full night recordings.
DBR dataset is an environmental audio dataset created for the Bachelor's Seminar in Signal Processing in Tampere University of Technology. The samples in the dataset were collected from the online audio database Freesound. The dataset consists of three classes, each containing 50 samples, and the classes are 'dog', 'bird', and 'rain' (hence the name DBR).
The FSL4 dataset contains ~4000 user-contributed loops uploaded to Freesound. Loops were selected by searching Freesound for sounds with the query terms loop and bpm, and then automatically parsing the returned sound filenames, tags and textual descriptions to identify tempo annotations made by users. For example, a sound containing the tag 120bpm is considered to have a ground truth of 120 BPM.
SimSceneTVB is a dataset of 600 simulated sound scenes of 45s each representing urban sound environments, simulated using the simScene Matlab library. The dataset is divided in two parts with a train subset (400 scenes) and a test subset (200 scenes) for the development of learning-based models. Each scene is composed of three main sources (traffic, human voices and birds) according to an original scenario, which is composed semi-randomly conditionally to five ambiances: park, quiet street, noisy street, very noisy street and square. Separate channels for the contribution of each source are available. The base audio files used for simulation are obtained from Freesound (https://freesound.org) and LibriSpeech (http://www.openslr.org/12). The sound scenes are scaled according to a playback sound level in dB, which is drawn randomly but remains plausible according to the ambiance.
SimSceneTVB Perception is a corpus of 100 sound scenes of 45s each representing urban sound environments, including: 6 scenes recorded in Paris, 19 scenes simulated using simScene to replicate recorded scenarios, 75 scenes simulated using simScene with diverse new scenarios, containing traffic, human voices and bird sources.The base audio files used for simulation are obtained from Freesound (https://freesound.org) and LibriSpeech (http://www.openslr.org/12).
The Sound Events for Surveillance Applications (SESA) dataset files were obtained from Freesound. The dataset was divided between train (480 files) and test (105 files) folders. All audio files are WAV, Mono-Channel, 16 kHz, and 8-bit with up to 33 seconds. # Classes: 0 - Casual (not a threat) 1 - Gunshot 2 - Explosion 3 - Siren (also contains alarms).
TUT Rare Sound events 2017, development dataset consists of source files for creating mixtures of rare sound events (classes baby cry, gun shot, glass break) with background audio, as well a set of readily generated mixtures and recipes for generating them. The "source" part of the dataset consists of two subsets: (a) background recordings from 15 different acoustic scenes, (b) recordings with the target rare sound events from three classes, accompanied by annotations of their temporal occurrences, (c) a set of meta files providing the cross-validation setup: lists of background and target event recordings split into training and test subsets (called "devtrain" and "devtest", respectively, indicating they are provided as the development dataset, as opposed to the evaluation dataset released separately). The mixture set consists of two subsets (training and testing), each containing ~1500 mixtures (~500 per target class in each subset, with half of the mixtures not containing any target