19,997 machine learning datasets
19,997 dataset results
The football keyword dataset (FKD), as a new keyword spotting dataset in Persian, is collected with crowdsourcing. This dataset contains nearly 31000 samples in 18 classes.
The C-SSRS dataset contains 500 Reddit posts from the subreddit r/depression. These posts are labeled by psychologists on a five point scale according to guidelines established in the Columbia Suicide Severity Rating Scale, which progress according to severity of depression. As this dataset is clinically verified and labeled, it is an adequate dataset to validate the label correction method, especially since it is from the same domain of mental health.
AraCOVID19-MFH is a manually annotated multi-label Arabic COVID-19 fake news and hate speech detection dataset. The dataset contains 10,828 Arabic tweets annotated with 10 different labels.
Global Symmetry Ground-truth for AVA dataset.
R2VQ is a dataset designed for testing competence-based comprehension of machines over a multimodal recipe collection, which contains text-video aligned recipes.
DBATES is a database of multimodal communication features extracted from debate speeches in the 2019 North American Universities Debate Championships (NAUDC).
The fetoscopy placenta dataset is associated with our MICCAI2020 publication titled “Deep Placental Vessel Segmentation for Fetoscopic Mosaicking”. The dataset contains 483 frames with ground-truth vessel segmentation annotations taken from six different in vivo fetoscopic procedure videos. The dataset also includes six unannotated in vivo continuous fetoscopic video clips (950 frames) with predicted vessel segmentation maps obtained from the leave-one-out cross-validation of our method.
Fusion-DHL is a multimodal sensor dataset with ground-truth positions.
This repository contains essays written by high school Brazilian students. These essays were graded by humans professionals following the criteria of the ENEM exam.
Data annotation The 1,073 full rare disease mention annotations (from 312 MIMIC-III discharge summaries) are in full_set_RD_ann_MIMIC_III_disch.csv.
ReactionGIF is an affective dataset of 30K tweets which can be used for tasks like induced sentiment prediction and multilabel classification of induced emotions.
EPISURG is a clinical dataset of $T_1$-weighted magnetic resonance images (MRI) from 430 epileptic patients who underwent resective brain surgery at the National Hospital of Neurology and Neurosurgery (Queen Square, London, United Kingdom) between 1990 and 2018.
The original paper presented a model of the industrial chemical process named Tennessee Eastman Process and a model-based TEP simulator for data generation. The most widely used benchmark consists of 22 datasets, 21 of which (Fault 1–21) contain faults and 1 (Fault 0) is fault-free. It is available in repository. All datasets have training (500 samples) and testing (960 samples) parts: training part has healthy state observations, testing part begins right after training, and contains faults which appear after 8 h since the training part. Each dataset has 52 features or observation variables with a 3 min sampling rate for most of all.
POINTREC is a test collection for point of interest (POI) recommendation, comprising of (i) a set of information needs, (ii) a dataset of POIs, and (iii) graded relevance assessments for information need and POI pairs.
SkyCam dataset is a collection of sky images from a variety of locations with diverse topological characteristics (Swiss Jura, Plateau and Pre-Alps regions), from both single and stereo camera settings coupled with a high-accuracy pyranometers. The dataset was collected with a high frequency with a data sample every 10 seconds. 13 images with different exposures times are generated along with a post-processed HDR images and a solar radiance values for each of the cameras and locations. We hope that SkyCam dataset will enable researchers to tackle the problem of short-term local camera-based solar radiance prediction.
The RISE (Robust Indoor Localization in Complex Scenarios) dataset is meant to train and evaluate visual indoor place recognizers. It contains more than 1 million geo-referenced images spread over 30 sequences, covering 5 heterogeneous buildings. For each building we provide: - A high resolution 3D point cloud (1cm) that defines the localization reference frame and that was generated with a mobile laser scanner and an inertial system. - Several image sequences spread over time with accurate ground truth poses retrieved by the laser scanner. Each sequence contains both, stereo pairs and spherical images. - Geo-referenced smartphone data, retrieved from the standard sensors of such devices.
SynthDerm is a synthetically generated dataset inspired by the real-world characteristics of melanoma skin lesions in dermatology settings. These characteristics include whether the lesion is asymmetrical, its border is irregular or jagged, is unevenly colored, has a diameter more than 0.25 inches, or is evolving in size, shape, or color over time. These qualities are usually referred to as ABCDE of melanoma. We generate SynthDerm algorithmically by varying several factors: skin tone, lesion shape, lesion size, lesion location (vertical and horizontal), and whether there are surgical markings present. We randomly assign one of the following to the lesion shape: round, asymmetrical, with jagged borders, or multi-colored (two different shades of colors overlaid with salt-and-pepper noise). For skin tone values, we simulate Fitzpatrick ratings. Fitzpatrick scale is a commonly used approach to classify the skin by its reaction to sunlight exposure modulated by the density of melanin pigmen
Instantiation is a dataset for the task of instantiation detection
UW-IS (UW Indoor Scenes) is a dataset for object recognition in indoor environments comprising scene images from two different environments, namely, a living room and a mock warehouse.
This experiment was performed in order to empirically measure the energy use of small, electric Unmanned Aerial Vehicles (UAVs). We autonomously direct a DJI ® Matrice 100 (M100) drone to take off, carry a range of payload weights on a triangular flight pattern, and land. Between flights, we varied specified parameters through a set of discrete options, payload of 0 , 250 g and 500 g; altitude during cruise of 25 m, 50 m, 75 m and 100 m; and speed during cruise of 4 m/s, 6 m/s, 8 m/s, 10 m/s and 12 m/s.