3,275 machine learning datasets
3,275 dataset results
Relations in Captions (REC-COCO) is a new dataset that contains associations between caption tokens and bounding boxes in images. REC-COCO is based on the MS-COCO and V-COCO datasets. For each image in V-COCO, we collect their corresponding captions from MS-COCO and automatically align the concept triplet in V-COCO to the tokens in the caption. This requires finding the token for concepts such as PERSON. As a result, REC-COCO contains the captions and the tokens which correspond to each subject and object, as well as the bounding boxes for the subject and object.
Given the difficulty to handle planetary data we provide downloadable files in PNG format from the missions Chang'E-3 and Chang'E-4. In addition to a set of scripts to do the conversion given a different PDS4 Dataset.
GQN rooms-ring-camera consist of scenes of a variable number of random objects captured in a square room of size 7x7 units. Wall textures, floor textures as well as the shapes of the objects are randomly chosen within a fixed pool of discrete options. There are 5 possible wall textures (red, green, cerise, orange, yellow), 3 possible floor textures (yellow, white, blue) and 7 possible object shapes (box, sphere, cylinder, capsule, cone, icosahedron and triangle). Each scene contains 1, 2 or 3 objects. In this simplified version of the dataset, the camera only moves on a fixed ring and always faces the center of the room.
A total of 80 real material samples were captured in a dark room. For each material, multiple captures were collected at different distances from the camera (between 250 and 650 mm) to observe both macro- and micro-level details. The dataset is mostly comprised of planar specimens but also includes non-planar objects such as mugs, globes, crumpled paper, etc. As shown above, it contains a rich diversity of materials, including diffuse or specular wrapping papers, fabrics, anisotropic metals, plastics, rugs, ceramic and wood flooring samples, etc. Each capture set includes 12 LDR (8 bpp) RGB-D images at 4K pixel resolution. Each set is captured at 50% and 100% of maximum light intensity. In total, we captured 462 such image sets (combinations of light intensities, distances to the camera, and material sample).
ACFR Orchard Fruit Dataset is an agricultural dataset containing images and annotations for different fruits, collected at different farms across Australia. The dataset was gathered by the agriculture team at the Australian Centre for Field Robotics, The University of Sydney, Australia.
To validate the generalization abilities of SOD models, we create a small-scale dataset by collecting the most challenging images with varying brightness and contrast, background and foreground colors overlap, among many others. We conclude that the current models, including ours, are not trust-worthy for real-world practice, demanding extensive future research for more efficient and generalized SOD models.
A database of 56 high quality fabric material measurements, provided as carefully calibrated rectified HDR images, together with SVBRDF fits. Used in the Fabric Appearance Challange.
The Deep Thermal Imaging dataset consists of two main datasets:
Motion similarity annotations for NTU RGB+D 120 dataset to evaluate motion similarity in the real world.
The training and validation data are subsets of the training split of the Imagenet 2012. The test set is taken from the validation split of the Imagenet 2012 dataset. Each data set includes 50 images per class.
The National Institute of Informatics provides LIFULL HOME'S Dataset to researchers, which was offered by LIFULL Co., Ltd. for promoting research in informatics and the related fields.
Dataset of 374 photos of hand-drawn sketches of App Inventor apps used for development of the Sketch2aia model for automatic generation of App Inventor wireframes from hand-drawn sketches.
Tsinghua Dogs is a fine-grained classification dataset for dogs, over 65% of whose images are collected from people's real life. Each dog breed in the dataset contains at least 200 images and a maximum of 7,449 images, basically in proportion to their frequency of occurrence in China, so it significantly increases the diversity for each breed over existing dataset. Furthermore, Tsinghua Dogs annotated bounding boxes of the dog’s whole body and head in each image, which can be used for supervising the training of learning algorithms as well as testing them.
The Universal-Scale object detection Benchmark (USB) is a benchmark for object detection that has variations in object scales and image domains by incorporating COCO with the recently proposed Waymo Open Dataset and Manga109-s dataset. To enable fair comparison, USB establishes different protocols by defining multiple thresholds for training epochs and evaluation image resolutions.
The SBCoseg dataset includes 889 groups of images and each group consists of 18 images with a common object, leading to 16002 images in total. The whole dataset is divided into five subsets: with ECFB, with TR, with MH, with SD, and Normal (normal data). The five subsets contain 193, 251, 82, 83, and 280 image groups, respectively. Each original image is in JPG format with a pixel size of 360 ×360, and each ground-truth image is in PNG format.
ArtDL is a novel painting data set for iconography classification composed of images collected from online sources. Most of the paintings are from the Renaissance period and depict scenes or characters of Christian art. The data set is annotated with classes representing specific characters belonging to the Iconclass classification system.
Seven different types of dry beans were used in this research, taking into account the features such as form, shape, type, and structure by the market situation. A computer vision system was developed to distinguish seven different registered varieties of dry beans with similar features in order to obtain uniform seed classification. For the classification model, images of 13,611 grains of 7 different registered dry beans were taken with a high-resolution camera. Bean images obtained by computer vision system were subjected to segmentation and feature extraction stages, and a total of 16 features; 12 dimensions and 4 shape forms, were obtained from the grains.
MSRB is a benchmarking dataset for marine snow removal of underwater images. Marine snow is one of the main degradation sources of underwater images that are caused by small particles, e.g., organic matter and sand, between the underwater scene and photosensors. The dataset consists of large-scale pairs of ground-truth and degraded images to calculate objective qualities for marine snow removal and to train a deep neural network. We propose two marine snow removal tasks using the dataset and show the first benchmarking results of marine snow removal.