3,275 machine learning datasets
3,275 dataset results
Robotic grasp dataset for multi-object multi-grasp evaluation with RGB-D data. This dataset is annotated using the same protocol as Cornell Dataset, and can be used as multi-object extension of Cornell Dataset.
The YFCC100M Fine-Grained Geolocation dataset is a subset of 100 a set of 36,146 YFCC100M images that had Flickr tags that could be identified as corresponding to one of the labels in the iNaturalist 2017 dataset. The 36,146 images that were selected so have the following characteristics: the image must have geolocation available, the image must have at most one iNaturalist label, at most ten examples were retained for each label.
T2 Guiding is a dataset of 1000 images, each with six image labels. The images are from the Open Images Dataset (OID) and the dataset includes 2 sets of machine-generated labels for these images.
The dataset contains 26,645, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th -- 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband).
ImagiFilter focusses on photographic and/or natural images, a very common use-case in computer vision research. Annotations for coarse prediction are provided, i.e. photographic vs. non-photographic, and smaller fine-grained prediction tasks where the non-photographic class is broken down into five classes: maps, drawings, graphs, icons, and sketches.
Cross-Reference Omnidirectional Stitching IQA is a novel omnidirectional image dataset containing stitched images as well as dual-fisheye images captured from standard quarters of 0◦, 90◦ , 180◦ and 270◦. In this manner, when evaluating the quality of an image stitched from a pair of fisheye images (e.g., 0◦ and 180◦), the other pair of fisheye images (e.g., 90◦ and 270◦) can be used as the cross-reference to provide ground-truth observations of the stitching regions.
DensePose-Track is a dataset of videos where selected frames are annotated in the traditional DensePose manner.
Ciona17 is a semantic segmentation dataset with pixel-level annotations pertaining to invasive species in a marine environment. Diverse outdoor illumination, a range of object shapes, colour, and severe occlusion provide a significant real world challenge for the computer vision community.
MinNav is a synthetic dataset based on the sandbox game Minecraft. The dataset uses several plug-in program to generate rendered image sequences with time-aligned depth maps, surface normal maps and camera poses. Thanks for the large game's community, there is an extremely large number of 3D open-world environment, users can find suitable scenes for shooting and build data sets through it and they can also build scenes in-game.
ARVSU contains a vast body of image variations in visual scenes with an annotated utterance and a corresponding addressee for each scenario.
Event-Stream Dataset is a robotic grasping dataset with 91 objects.
SemanticUSL is a dataset for domain adaptation for LiDAR point cloud semantic segmentation. The dataset has the same data format and ontology as SemanticKITTI.
WildestFaces is tailored to study cross-domain recognition under a variety of adverse conditions.
FAD is a dataset that have roughly 200,000 attribute labels for the above traits, for over 10,000 facial images.
HARRISON dataset is a benchmark on hashtag recommendation for real world images in social networks. The HARRISON dataset is a realistic dataset, composed of 57,383 photos from Instagram and an average of 4.5 associated hashtags for each photo.
MVB (Multi View Baggage) is a dataset for baggage ReID task which has some essential differences from person ReID. The features of MVB are three-fold. First, MVB is the first publicly released large-scale dataset that contains 4519 baggage identities and 22660 annotated baggage images as well as its surface material labels. Second, all baggage images are captured by specially-designed multi-view camera system to handle pose variation and occlusion, in order to obtain the 3D information of baggage surface as complete as possible. Third, MVB has remarkable inter-class similarity and intra-class dissimilarity, considering the fact that baggage might have very similar appearance while the data is collected in two real airport environments, where imaging factors varies significantly from each other.
SEmantic Salient Instance Video (SESIV) dataset is obtained by augmenting the DAVIS-2017 benchmark dataset by assigning semantic ground-truth for salient instance labels. The SESIV dataset consists of 84 high-quality video sequences with pixel-wisely per-frame ground-truth labels.
The social vision and language dataset is a large-scale multimodal dataset designed for research into social contextual learning.
BigBIRD is a 3D dataset of 125 objects, with the following data for each object:
The CUHK Face Alignment Database is dataset with 13,466 face images, among which 5, 590 images are from LFW and the remaining 7, 876 images are downloaded from the web. Each face is labeled with the positions of five facial keypoints. 10,000 images are used for training and the remaining 3,466 images for validation.