3,275 machine learning datasets
3,275 dataset results
The ISIC 2018 dataset was published by the International Skin Imaging Collaboration (ISIC) as a large-scale dataset of dermoscopy images. The Task 2 dataset is the challenge on lesion attribute detection. It includes 2594 images. The task is to detect the following dermoscopic attributes: pigment network, negative network, streaks, mila-like cysts and globules (including dots).
VisPro dataset contains coreference annotation of 29,722 pronouns from 5,000 dialogues.
The AuDio Visual Aerial sceNe reCognition datasEt (ADVANCE) is a brand-new multimodal learning dataset, which aims to explore the contribution of both audio and conventional visual messages to scene recognition. This dataset in summary contains 5075 pairs of geotagged aerial images and sounds, classified into 13 scene classes, i.e., airport, sports land, beach, bridge, farmland, forest, grassland, harbor, lake, orchard, residential area, shrub land, and train station.
Lytro Illum is a new light field dataset using a Lytro Illum camera. 640 light fields are collected with significant variations in terms of size, textureness, background clutter and illumination, etc. Micro-lens image arrays and central viewing images are generated, and corresponding ground-truth maps are produced.
AO-CLEVr is a new synthetic-images dataset containing images of "easy" Attribute-Object categories, based on the CLEVr. AO-CLEVr has attribute-object pairs created from 8 attributes: { red, purple, yellow, blue, green, cyan, gray, brown } and 3 object shapes {sphere, cube, cylinder}, yielding 24 attribute-object pairs. Each pair consists of 7500 images. Each image has a single object that consists of the attribute-object pair. The object is randomly assigned one of two sizes (small/large), one of two materials (rubber/metallic), a random position, and random lightning according to CLEVr defaults.
ChineseFoodNet aims to automatically recognizing pictured Chinese dishes. Most of the existing food image datasets collected food images either from recipe pictures or selfie. In the dataset, images of each food category of the dataset consists of not only web recipe and menu pictures but photos taken from real dishes, recipe and menu as well. ChineseFoodNet contains over 180,000 food photos of 208 categories, with each category covering a large variations in presentations of same Chinese food.
CITE is a crowd-sourced resource for multimodal discourse: this resource characterises inferences in image-text contexts in the domain of cooking recipes in the form of coherence relations.
The EuroCity Persons dataset provides a large number of highly diverse, accurate and detailed annotations of pedestrians, cyclists and other riders in urban traffic scenes. The images for this dataset were collected on-board a moving vehicle in 31 cities of 12 European countries. With over 238,200 person instances manually labeled in over 47,300 images, EuroCity Persons is nearly one order of magnitude larger than person datasets used previously for benchmarking. The dataset furthermore contains a large number of person orientation annotations (over 211,200).
Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.
FFHQ-Aging is a Dataset of human faces designed for benchmarking age transformation algorithms as well as many other possible vision tasks. This dataset is an extention of the NVIDIA FFHQ dataset, on top of the 70,000 original FFHQ images, it also contains the following information for each image: * Gender information (male/female with confidence score) * Age group information (10 classes with confidence score) * Head pose (pitch, roll & yaw) * Glasses type (none, normal or dark) * Eye occlusion score (0-100, different score for each eye) * Full semantic map (19 classes, based on CelebAMask-HQ labels)
iCubWorld datasets are collections of images recording the visual experience of iCub while observing objects in its typical environment, a laboratory or an office. The acquisition setting is devised to allow a natural human-robot interaction, where a teacher verbally provides the label of the object of interest and shows it to the robot, by holding it in the hand; the iCub can either track the object while the teacher moves it, or take it in its hand.
IIIT-AR-13K is created by manually annotating the bounding boxes of graphical or page objects in publicly available annual reports. This dataset contains a total of 13k annotated page images with objects in five different popular categories - table, figure, natural image, logo, and signature. It is the largest manually annotated dataset for graphical object detection.
MatterportLayout extends the Matterport3D dataset with general Manhattan layout annotations. It has 2,295 RGBD panoramic images from Matterport3D which are extended with ground truth 3D layouts.
Contains video clips shot with modern high-resolution mobile cameras, with strong projective distortions and with low lighting conditions.
The Natural-Color Dataset (NCD) is an image colorization dataset where images are true to their colors. For example, a carrot will have an orange color in most images. Bananas will be either greenish or yellowish. It contains 723 images from the internet distributed in 20 categories. Each image has an object and a white background.
The Relative Size dataset contains 486 object pairs between 41 physical objects. Size comparisons are not available for all pairs of objects (e.g. bird and watermelon) because for some pairs humans cannot determine which object is bigger.
The Urban Environments dataset is a dataset of 20 land use classes across 300 European cities paired with satellite imagery data.
WHOI-Plankton is a collection of annotated plankton images. It contains > 3.5 million images of microscopic marine plankton, organized according to category labels provided by researchers at the Woods Hole Oceanographic Institution (WHOI). The images are currently placed into one of 103 categories.
A new dataset with significant occlusions related to object manipulation.
Dataset for large-scale yoga pose recognition with 82 classes.