3,275 machine learning datasets
3,275 dataset results
SIDOD is a new, publicly-available image dataset generated by the NVIDIA Deep Learning Data Synthesizer intended for use in object detection, pose estimation, and tracking applications. This dataset contains 144k stereo image pairs that synthetically combine 18 camera viewpoints of three photorealistic virtual environments with up to 10 objects (chosen randomly from the 21 object models of the YCB dataset) and flying distractors.
The simply-CLEVR dataset aims to provide a benchmark dataset that can be used for transparent quantitative evaluation of explanation methods (aka heatmaps/XAI methods). It is made of simple Visual Question Answering (VQA) questions, which are derived from the original CLEVR task, and where each question is accompanied by two Ground Truth Masks that serve as a basis for evaluating explanations on the input image.
Contains three types of 2D-3D reasoning tasks on view consistency, camera pose, and shape generation, with increasing difficulty.
SpectroVision is a dataset of 14,400 high resolution texture images and spectral measurements collected from a PR2 mobile manipulator that interacted with 144 household objects from eight material categories.
This dataset is a large-scale synthetic dataset to simulate the attack scenario for a keystroke inference attack.
TinySocial is a dataset to enable research on Social Visual Question Answering.
Toulouse Vanishing Points Dataset is a public photographs database of Manhattan scenes taken with an iPad Air 1. The purpose of this dataset is the evaluation of vanishing points estimation algorithms. Its originality is the addition of Inertial Measurement Unit (IMU) data synchronized with the camera under the form of rotation matrices. Moreover, contrary to existing works which provide vanishing points of reference in the form of single points, there are computed uncertainty regions.
This dataset contains two subsets of flood images from Twitter: The Harz17 dataset comprises images from tweets containing flood-related keywords during the occurrence of a flood in the Harz region in Germany in July 2017. Similarly, the Rhine18 dataset comprises images related to a flood of the river Rhine in January 2018.
Comprises twenty individuals picking up and placing objects of varying weights to and from cabinet and table locations at various heights.
The Visual Discriminative Question Generation (VDQG) dataset contains 11202 ambiguous image pairs collected from Visual Genome. Each image pair is annotated with 4.6 discriminative questions and 5.9 non-discriminative questions on average.
The VIA dataset is a dataset for aiding the visually impaired. The proposed datase1 consists of 342 images divided into two classes: 175 of them are “clear-path” and 167 are “nonclear” path. They were taken using a smartphone camera and resized to 750 × 1000 pixels. The smartphone was placed in the user chest height and inclined approximately 30 to 60 from the ground, so it could capture a few meters of the path ahead, and beyond the reach of a regular white cane
The WIDER Attribute dataset is a human attribute recognition dataset with human attribute and image event annotations. Images are selected from the WIDER dataset. There are a total of 13,789 images. A bounding box is annotated for each person in these images, with no more than 20 people (with top resolutions) in a crowd image, resulting in 57,524 boxes in total and 4+ boxes per image on average. For each bounding box, 14 distinct human attributes are labelled. There are 805,336 labels in total.
The Wiki-Flick Event dataset for cross-modal event retrieval is a well-labelled but weakly-aligned dataset collected for cross-modality event retrieval. The dataset consists of 28,825 images on Flickr and 11,960 text articles from hundreds of social media, belonging to 82 categories of events.
The X-MARS dataset proposes new splits for the MARS dataset, to allow for cross-evaluation with the Market-1501 dataset without training and test overlap between the two datasets.
The Kite database is a multi-modal dataset for the control of unmanned aerial vehicles (UAVs). There are three modalities present in the dataset:
COCO Earthquake is a dataset similar to Common Objects in Context (COCO) used for cracking segmentation. The images selected in the dataset are at various scales, and the tool referred to as the COCO Annotator is used to label cracks for training. In these labeled images, cracks are in yellow and background is in purple. Size of the training and labeling images is varied from 168×300 to 4600×3070. By excluding steel structures, 2,021 images are labeled when surface cracks appeared on structural or nonstructural materials at various scales.
InstaCities1M is a dataset of social media images with associated text. It consists of Instagram images associated associated with one of the 10 most populated English speaking cities all over the world. It has 100K images for each city, which makes a total of 1M images, split in 800K training images, 50K validation images and 150K testing images. All images were resized to 300x300 pixels.
The LWIR DoFP Dataset of Road Scene (LDDRS) is a road detection dataset with 2,113 annotated images. It contains both day and night scenes, with multiple cars and pedestrians per image.
Cross Modal Automatic Commenting (CMAC) is a task which aims to automatically generate comments for graphic news. The CMAC dataset is a large-scale dataset for this task which consists of 24,134 graphic news. Each instance is composed of several news photos, news title, news body, and corresponding high-quality comments.
Processed Twitter is a dataset that is used for Twitter topic recognition. It contains tweets from 6 different topics.