Datasets

486 machine learning datasets

486 dataset results

MTic (MADHAVLab Tic)

Periodic Tic sounds (T0=1s) sampled at 16kHz with duration of nearly 10s.

RESPIRATORY AND DRUG ACTUATION DATASET

Asthma is a common, usually long-term respiratory disease with negative impact on society and the economy worldwide. Treatment involves using medical devices (inhalers) that distribute medicationto the airways, and its efficiency depends on the precision of the inhalation technique. Health monitoring systems equipped with sensors and embedded with sound signal detection enable the recognition of drug actuation and could be powerful tools for reliable audio content analysis. The RDA Suite includes a set of tools for audio processing, feature extraction and classification and is provided along with a dataset consisting of respiratory and drug actuation sounds. The classification models in RDA are implemented based on conventional and advanced machine learning and deep network architectures. This study provides a comparative evaluation of the implemented approaches, examines potential improvements and discusses challenges and future tendencies. The central aim of this research is to ident

1 papers0 benchmarksAudio

Pre-Contest Workshop Video Recordings

In this Pre-Contest Workshop Video Recordings folder:

1 papers0 benchmarksAudio, Videos

Urban Soundscapes of the World

A main goal of the Urban Soundscapes of the World project is to create a reference database of examples of urban acoustic environments, consisting of high-quality immersive audiovisual recordings (360-degree video and spatial audio), in adherence to ISO 12913-2. Ultimately, this database may set the scope for immersive recording and reproducing urban acoustic environments with soundscape in mind.

1 papers0 benchmarksAudio, Videos

Western Mediterranean Wetlands Birds - Version 2

The Western Mediterranean Wetlands Bird Dataset is a collection of birds' vocalizations of different lengths that primarily consists of 5,795 labelled audio clips derived from 1,098 recordings, totalling 201.6 minutes or 12,096 seconds alongside with corresponding annotations. It also comes with Mel spectrogram version of the data, where an image represents a 1-second window of the original audio, resulting in a total of 17,536 spectrographic images. These are stored in matrix form within .npy files. These are the species covered:

1 papers0 benchmarksAudio, Images

Auditory Detection of Sound (ADS)

Test dataset for unsupervised anomaly detection in sound (ADS).

1 papers0 benchmarksAudio

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

BGG dataset (PUBG Gun Sound Dataset)

We recorded gun sounds by changing the type and position of guns to diversify distances and angles in the PUBG environment. The BGG dataset consists of 2,195 samples with 37 different types of guns and five directions, including a silence in which there is no gunfire, but noises exist. The distance from the firearms ranged from 0 meters to 600 meters. The audio was recorded in stereo (i.e., two-channel audio), and each sample contains various environmental noises (e.g., water splashing, walking, and bullet friction).

1 papers0 benchmarksAudio, Stereo

FSC-P2 (Fearless Steps Challenge Phase2)

The Fearless Steps Initiative by UTDallas-CRSS led to the digitization, recovery, and diarization of 19,000 hours of original analog audio data, as well as the development of algorithms to extract meaningful information from this multichannel naturalistic data resource. As an initial step to motivate a stream-lined and collaborative effort from the speech and language community, UTDallas-CRSS is hosting a series of progressively complex tasks to promote advanced research on naturalistic “Big Data” corpora. This began with ISCA INTERSPEECH-2019: "The FEARLESS STEPS Challenge: Massive Naturalistic Audio (FS-#1)". This first edition of this challenge encouraged the development of core unsupervised/semi-supervised speech and language systems for single-channel data with low resource availability, serving as the “First Step” towards extracting high-level information from such massive unlabeled corpora. As a natural progression following the successful Inaugural Challenge FS#1, the FEARLESS

1 papers0 benchmarksAudio, Texts

Thorsten voice 21.02 neutral

Thorsten-Voice (Thorsten-21.02-neutral) is a neutrally spoken voice dataset recorded by Thorsten Müller, audio optimized by Dominik Kreutz and licenced under CC0 to provide it for anybody without any financial or licence struggle. It is intended to be used for speech synthesis in German as a single speaker dataset. It contains about 23 hours of high quality audio

1 papers1 benchmarksAudio

Voxforge German

VoxForge is an open speech dataset that was set up to collect transcribed speech for use with Free and Open Source Speech Recognition Engines (on Linux, Windows and Mac).

1 papers2 benchmarksAudio

M-AILabs speech dataset

The M-AILABS Speech Dataset is the first large dataset that we are providing free-of-charge, freely usable as training data for speech recognition and speech synthesis. Most of the data is based on LibriVox and Project Gutenberg. The training data consist of nearly thousand hours of audio and the text-files in prepared format. A transcription is provided for each clip. Clips vary in length from 1 to 20 seconds and have a total length of approximately shown in the list (and in the respective info.txt-files) below. The texts were published between 1884 and 1964, and are in the public domain. The audio was recorded by the LibriVox project and is also in the public domain

1 papers2 benchmarksAudio

Nlakh

Nlakh is a dataset for Musical Instrument Retrieval. It is a combination of the NSynth dataset, which provides a large number of instruments, and the Lakh dataset, which provides multi-track MIDI data.

1 papers0 benchmarksAudio, Music

Virtuoso Strings

Virtuoso Strings is a dataset for soft onsets detection for string instruments. It consists of over 144 recordings of professional performances of an excerpt from Haydn's string quartet Op. 74 No. 1 Finale, each with corresponding individual instrumental onset annotations.

1 papers0 benchmarksAudio, Music

Datasets for automatic acoustic identification of insects (Orthoptera and Cicadidae)

This dataset contains recordings of 32 sound producing insect species with a total 335 files and a length of 57 minutes. The dataset was compiled for training neural networks to automatically identify insect species while comparing adaptive, waveform-based frontends to conventional mel-spectrogram frontends for audio feature extraction. This work will be submitted for publication in the future and this dataset can be used to replicate the results, as well as other uses. The scripts for audio processing and the machine learning implementations will be published on Github.

1 papers0 benchmarksAudio, Biology, Environment

Haydn Annotation Dataset

The Haydn Annotation Dataset consists of note onset annotations from 24 experiment participants with varying musical experience. The annotation experiments use recordings from the ARME Virtuoso Strings Dataset.

1 papers0 benchmarksAudio, Music

IMaSC (ICFOSS Malayalam Speech Corpus)

IMaSC is a Malayalam text and speech corpus made available by ICFOSS for the purpose of developing speech technology for Malayalam, particularly text-to-speech. The corpus contains 34,473 text-audio pairs of Malayalam sentences spoken by 8 speakers, totalling in approximately 50 hours of audio.

1 papers0 benchmarksAudio, Texts

Skit-S2I (Skit-S2I: An Indian Accented Speech to Intent dataset)

This dataset for Intent classification from human speech covers 14 coarse-grained intents from the Banking domain. This work is inspired by a similar release in the Minds-14 dataset - here, we restrict ourselves to Indian English but with a much larger training set. The data was generated by 11 (Indian English) speakers and recorded over a telephony line. We also provide access to anonymized speaker information - like gender, languages spoken, and native language - to allow more structured discussions around robustness and bias in the models you train.

1 papers0 benchmarksAudio, Texts

ASR-RAMC-BIGCCSC: A CHINESE CONVERSATIONAL SPEECH CORPUS

A Rich Annotated Mandarin Conversational (RAMC) Speech Dataset, including 180 hours of Mandarin Chinese dialogue, 150, 10 and 20 hours for the training set, development set and test set respectively. It contains 351 multi-turn dialogues, each of which is a coherent and compact conversation centered around one theme.

1 papers0 benchmarksAudio, Texts

UR5 Tool Dataset

In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

PreviousPage 18 of 25Next