3,275 machine learning datasets
3,275 dataset results
The INRIA-Horse dataset consists of 170 horse images and 170 images without horses. All horses in all images are annotated with a bounding-box. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The horses are mostly unoccluded, taken from approximately the side viewpoint, and face the same direction.
The Poser dataset is a dataset for pose estimation which consists of 1927 training and 418 test images. These images are synthetically generated and tuned to unimodal predictions. The images were generated using the Poser software package.
The Retinal Microsurgery dataset is a dataset for surgical instrument tracking. It consists of 18 in-vivo sequences, each with 200 frames of resolution 1920 × 1080 pixels. The dataset is further classified into four instrument-dependent subsets. The annotated tool joints are n=3 and semantic classes c=2 (tool and background).
The L-Bird (Large-Bird) dataset contains nearly 4.8 million images which are obtained by searching images of a total of 10,982 bird species from the Internet.
The DispScenes dataset was created to address the specific problem of disparate image matching. The image pairs in all the datasets exhibit high levels of variation in illumination and viewpoint and also contain instances of occlusion. The DispScenes dataset provides manual ground truth keypoint correspondences for all images.
MSRA10K is a dataset for salient object detection that contains 10,000 images with pixel-level saliency labeling for 10K images from the MSRA salient object detection dataset. The original MRSA database provides salient object annotation in terms of bounding boxes provided by 3-9 users.
The ICL-NUIM dataset aims at benchmarking RGB-D, Visual Odometry and SLAM algorithms. Two different scenes (the living room and the office room scene) are provided with ground truth. Living room has 3D surface ground truth together with the depth-maps as well as camera poses and as a result perfectly suits not just for benchmarking camera trajectory but also reconstruction. Office room scene comes with only trajectory data and does not have any explicit 3D model with it.
Middlebury 2003 is a stereo dataset for indoor scenes.
The ISIC 2017 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 2 challenge dataset for lesion dermoscopic feature extraction contains the original lesion image, a corresponding superpixel mask, and superpixel-mapped expert annotations of the presence and absence of the following features: (a) network, (b) negative network, (c) streaks and (d) milia-like cysts.
The Rendered SST2 dataset is a dataset released by OpenAI, that measures the optical character recognition capability of visual representations. It uses sentences from the Stanford Sentiment Treebank dataset and renders them into images, with black texts on a white background, in a 448×448 resolution.
We release the MUStARD dataset which is a multimodal video corpus for research in automated sarcasm discovery. The dataset is compiled from popular TV shows including Friends, The Golden Girls, The Big Bang Theory, and Sarcasmaholics Anonymous. MUStARD consists of audiovisual utterances annotated with sarcasm labels. Each utterance is accompanied by its context, which provides additional information on the scenario where the utterance occurs.
The dataset, VIST-Edit, includes 14,905 human-edited versions of 2,981 machine-generated visual stories. The stories were generated by two state-of-the-art visual storytelling models, each aligned to 5 human-edited versions.
Extended Labeled Faces in-the-Wild (ELFW) is a dataset supplementing with additional face-related categories —and also additional faces— the originally released semantic labels in the vastly used Labeled Faces in-the-Wild (LFW) dataset. Additionally, two object-based data augmentation techniques are deployed to synthetically enrich under-represented categories which, in benchmarking experiments, reveal that not only segmenting the augmented categories improves, but also the remaining ones benefit.
KANFace consists of 40K still images and 44K sequences (14.5M video frames in total) captured in unconstrained, real-world conditions from 1,045 subjects. The dataset is manually annotated in terms of identity, exact age, gender and kinship.
AV Digits Database is an audiovisual database which contains normal, whispered and silent speech. 53 participants were recorded from 3 different views (frontal, 45 and profile) pronouncing digits and phrases in three speech modes.
SmartCity consists of 50 images in total collected from ten city scenes including office entrance, sidewalk, atrium, shopping mall etc.. Unlike the existing crowd counting datasets with images of hundreds/thousands of pedestrians and nearly all the images being taken outdoors, SmartCity has few pedestrians in images and consists of both outdoor and indoor scenes: the average number of pedestrians is only 7.4 with minimum being 1 and maximum being 14.
The UFPR-Eyeglasses dataset has 1,135 images of both eyes (2,270 cropped images of each eye) from 83 subjects (166 classes). The dataset is used to evaluate the effect of the occlusion caused by eyeglasses in periocular recognition.
EyeCar is a dataset of driving videos of vehicles involved in rear-end collisions paired with eye fixation data captured from human subjects. It contains 21 front-view videos that were captured in various traffic, weather, and day light conditions. Each video is 30sec in length and contains typical driving tasks (e.g., lanekeeping, merging-in, and braking) ending to rear-end collisions.
360-SOD contains 500 high-resolution equirectangular images.
Repository of a generative art dataset by computer artist Andy Lomas.