19,997 machine learning datasets
19,997 dataset results
The DIGITS dataset consists of 1797 8×8 grayscale images (1439 for training and 360 for testing) of handwritten digits.
The UT-Interaction dataset contains videos of continuous executions of 6 classes of human-human interactions: shake-hands, point, hug, push, kick and punch. Ground truth labels for these interactions are provided, including time intervals and bounding boxes. There is a total of 20 video sequences whose lengths are around 1 minute. Each video contains at least one execution per interaction, resulting in 8 executions of human activities per video on average. Several participants with more than 15 different clothing conditions appear in the videos. The videos are taken with the resolution of 720*480, 30fps, and the height of a person in the video is about 200 pixels.
The INRIA Aerial Image Labeling dataset is comprised of 360 RGB tiles of 5000×5000px with a spatial resolution of 30cm/px on 10 cities across the globe. Half of the cities are used for training and are associated to a public ground truth of building footprints. The rest of the dataset is used only for evaluation with a hidden ground truth. The dataset was constructed by combining public domain imagery and public domain official building footprints.
A social network of LastFM users which was collected from the public API in March 2020. Nodes are LastFM users from Asian countries and edges are mutual follower relationships between them. The vertex features are extracted based on the artists liked by the users. The task related to the graph is multinomial node classification - one has to predict the location of users. This target feature was derived from the country field for each user.
MIR-1K (Multimedia Information Retrieval lab, 1000 song clips) is a dataset designed for singing voice separation. It contains:
CAL500 (Computer Audition Lab 500) is a dataset aimed for evaluation of music information retrieval systems. It consists of 502 songs picked from western popular music. The audio is represented as a time series of the first 13 Mel-frequency cepstral coefficients (and their first and second derivatives) extracted by sliding a 12 ms half-overlapping short-time window over the waveform of each song. Each song has been annotated by at least 3 people with 135 musically-relevant concepts spanning six semantic categories:
The WIQA dataset V1 has 39705 questions containing a perturbation and a possible effect in the context of a paragraph. The dataset is split into 29808 train questions, 6894 dev questions and 3003 test questions.
Animal-Pose Dataset is an animal pose dataset to facilitate training and evaluation. This dataset provides animal pose annotations on five categories are provided: dog, cat, cow, horse, sheep, with in total 6,000+ instances in 4,000+ images. Besides, the dataset also contains bounding box annotations for other 7 animal categories.
Consists of 20k English biomedical entity mentions from Reddit expert-annotated with links to SNOMED CT, a widely-used medical knowledge graph.
Encourages machine learning research in this area and to help facilitate further work in understanding and mitigating the effects of climate change.
Includes 5,824 fundus images labeled with either positive glaucoma (2,392) or negative glaucoma (3,432).
A centralized benchmark for Linguistic Code-switching Evaluation (LinCE) that combines ten corpora covering four different code-switched language pairs (i.e., Spanish-English, Nepali-English, Hindi-English, and Modern Standard Arabic-Egyptian Arabic) and four tasks (i.e., language identification, named entity recognition, part-of-speech tagging, and sentiment analysis).
A composite dataset that unifies semantic segmentation datasets from different domains.
Publicly available dataset of naturally occurring factual claims for the purpose of automatic claim verification. It is collected from 26 fact checking websites in English, paired with textual sources and rich metadata, and labelled for veracity by human expert journalists.
Contains one million naturally occurring sentence rewrites, providing sixty times more distinct split examples and a ninety times larger vocabulary than the WebSplit corpus introduced by Narayan et al. (2017) as a benchmark for this task.
AIST++ is a 3D dance dataset which contains 3D motion reconstructed from real dancers paired with music. The AIST++ Dance Motion Dataset is constructed from the AIST Dance Video DB. With multi-view videos, an elaborate pipeline is designed to estimate the camera parameters, 3D human keypoints and 3D human dance motion sequences:
COCO-CN is a bilingual image description dataset enriching MS-COCO with manually written Chinese sentences and tags. The new dataset can be used for multiple tasks including image tagging, captioning and retrieval, all in a cross-lingual setting.
FIW is a large and comprehensive database available for kinship recognition. FIW is made up of 11,932 natural family photos of 1,000 families-- nearly 10x more than the next-to-largest, Family-101 database. Also, it contains 656,954 image pairs split between the 11 relationships, which is much larger than the 2nd to largest KinFaceW-II with 2,000 pairs for only 4 kinship types.
JAAD is a dataset for studying joint attention in the context of autonomous driving. The focus is on pedestrian and driver behaviors at the point of crossing and factors that influence them. To this end, JAAD dataset provides a richly annotated collection of 346 short video clips (5-10 sec long) extracted from over 240 hours of driving footage. These videos filmed in several locations in North America and Eastern Europe represent scenes typical for everyday urban driving in various weather conditions.
Stuttering Events in Podcasts (SEP-28k) is a dataset containing over 28k clips labeled with five event types including blocks, prolongations, sound repetitions, word repetitions, and interjections. Audio comes from public podcasts largely consisting of people who stutter interviewing other people who stutter.