3,275 machine learning datasets
3,275 dataset results
SeaDronesSee is a large-scale data set aimed at helping develop systems for Search and Rescue (SAR) using Unmanned Aerial Vehicles (UAVs) in maritime scenarios. Building highly complex autonomous UAV systems that aid in SAR missions requires robust computer vision algorithms to detect and track objects or persons of interest. This data set provides three sets of tracks: object detection, single-object tracking and multi-object tracking. Each track consists of its own data set and leaderboard.
The Endomapper dataset is the first collection of complete endoscopy sequences acquired during regular medical practice, including slow and careful screening explorations, making secondary use of medical data. Its original purpose is to facilitate the development and evaluation of VSLAM (Visual Simultaneous Localization and Mapping) methods in real endoscopy data. The first release of the dataset is composed of 50 sequences with a total of more than 13 hours of video. It is also the first endoscopic dataset that includes both the computed geometric and photometric endoscope calibration as well as the original calibration videos. Meta-data and annotations associated to the dataset varies from anatomical landmark and description of the procedure labeling, tools segmentation masks, COLMAP 3D reconstructions, simulated sequences with groundtruth and meta-data related to special cases, such as sequences from the same patient. This information will improve the research in endoscopic VSLAM, a
Gender-biased FFHQ dataset (bFFHQ) has age as a target label and gender as a correlated bias, and the images are from the FFHQ dataset. The images include the dominant number of young women (i.e., aged 10-29) and old men (i.e., aged 40-59) in the training data.
The dataset was created for video quality assessment problem. It was formed with 36 clips from Vimeo, which were selected from 18,000+ open-source clips with high bitrate (license CCBY or CC0).
TID2013 is a dataset for image quality assessment that contains 25 reference images and 3000 distorted images (25 reference images x 24 types of distortions x 5 levels of distortions).
ApolloCar3DT is a dataset that contains 5,277 driving images and over 60K car instances, where each car is fitted with an industry-grade 3D CAD model with absolute model size and semantically labelled keypoints. This dataset is above 20 times larger than PASCAL3D+ and KITTI, the current state-of-the-art.
The Kumar dataset contains 30 1,000×1,000 image tiles from seven organs (6 breast, 6 liver, 6 kidney, 6 prostate, 2 bladder, 2 colon and 2 stomach) of The Cancer Genome Atlas (TCGA) database acquired at 40× magnification. Within each image, the boundary of each nucleus is fully annotated.
The New College Data is a freely available dataset collected from a robot completing several loops outdoors around the New College campus in Oxford. The data includes odometry, laser scan, and visual information. The dataset URL is not working anymore.
SceneNet is a dataset of labelled synthetic indoor scenes. There are several labeled indoor scenes, including:
Source: https://www.vicos.si/Projects/CDTB 4.2 State-of-the-art Comparison A TH CTB (color-and-depth visual object tracking) dataset is recorded by several passive and active RGB-D setups and contains indoor as well as outdoor sequences acquired in direct sunlight. The sequences were recorded to contain significant object pose change, clutter, occlusion, and periods of long-term target absence to enable tracker evaluation under realistic conditions. Sequences are per-frame annotated with 13 visual attributes for detailed analysis. It contains around 100,000 samples. Image Source: https://www.vicos.si/Projects/CDTB
CASIA-HWDB is a dataset for handwritten Chinese character recognition. It contains 300 files (240 in HWDB1.1 training set and 60 in HWDB1.1 test set). Each file contains about 3000 isolated gray-scale Chinese character images written by one writer, as well as their corresponding labels.
The D-HAZY dataset is generated from NYU depth indoor image collection. D-HAZY contains depth map for each indoor hazy image. It contains 1400+ real images and corresponding depth maps used to synthesize hazy scenes based on Koschmieder’s light propagation mode
CLEVR-Ref+ is a synthetic diagnostic dataset for referring expression comprehension. The precise locations and attributes of the objects are readily available, and the referring expressions are automatically associated with functional programs. The synthetic nature allows control over dataset bias (through sampling strategy), and the modular programs enable intermediate reasoning ground truth without human annotators.
The Fluorescence Microscopy Denoising (FMD) dataset is dedicated to Poisson-Gaussian denoising. The dataset consists of 12,000 real fluorescence microscopy images obtained with commercial confocal, two-photon, and wide-field microscopes and representative biological samples such as cells, zebrafish, and mouse brain tissues. Image averaging is used to effectively obtain ground truth images and 60,000 noisy images with different noise levels.
MLRSNet is a a multi-label high spatial resolution remote sensing dataset for semantic scene understanding. It provides different perspectives of the world captured from satellites. That is, it is composed of high spatial resolution optical satellite images. MLRSNet contains 109,161 remote sensing images that are annotated into 46 categories, and the number of sample images in a category varies from 1,500 to 3,000. The images have a fixed size of 256×256 pixels with various pixel resolutions (~10m to 0.1m). Moreover, each image in the dataset is tagged with several of 60 predefined class labels, and the number of labels associated with each image varies from 1 to 13. The dataset can be used for multi-label based image classification, multi-label based image retrieval, and image segmentation.
RPC is a large-scale retail product checkout dataset and collects 200 retail SKUs. The collected SKUs can be divided into 17 meta categories, i.e., puffed food, dried fruit, dried food, instant drink, instant noodles, dessert, drink, alcohol, milk, canned food, chocolate, gum, candy, seasoner, personal hygiene, tissue, stationery.
The dataset for this challenge was obtained by carefully annotating tissue images of several patients with tumors of different organs and who were diagnosed at multiple hospitals. This dataset was created by downloading H&E stained tissue images captured at 40x magnification from TCGA archive. H&E staining is a routine protocol to enhance the contrast of a tissue section and is commonly used for tumor assessment (grading, staging, etc.). Given the diversity of nuclei appearances across multiple organs and patients, and the richness of staining protocols adopted at multiple hospitals, the training datatset will enable the development of robust and generalizable nuclei segmentation techniques that will work right out of the box.
VQA-E is a dataset for Visual Question Answering with Explanation, where the models are required to generate and explanation with the predicted answer. The VQA-E dataset is automatically derived from the VQA v2 dataset by synthesizing a textual explanation for each image-question-answer triple.
The shiny folder contains 8 scenes with challenging view-dependent effects used in our paper. We also provide additional scenes in the shiny_extended folder. The test images for each scene used in our paper consist of one of every eight images in alphabetical order.
The York Urban Line Segment Database is a compilation of 102 images (45 indoor, 57 outdoor) of urban environments consisting mostly of scenes from the campus of York University and downtown Toronto, Canada. The images are 640 x 480 in size and have been taken with a calibrated Panasonic Lumix DMC-LC80 digital camera.