3,275 machine learning datasets
3,275 dataset results
The OCR-IDL dataset comprises the OCR annotations for a subset of 26M pages of the large-scale IDL document library. These annotations have a monetary value over $20,000 and are made publicly available with the aim of advancing the Document Intelligence research field. Our motivation is two-fold: First, by making these annotations public, we aim to level the differences between research groups and companies who have big private datasets to pre/train on. And second, we make use of a commercial OCR engine to obtain high quality annotations, leading to reduce the noise provided by OCR on pretraining and downstream tasks.
This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.
MAPS-KB is a million-scale probabilistic simile knowledge base, covering 4.3 million triplets over 0.4 million terms from 70 GB corpora. It is designed for the tasks of simile detection and component extraction.
Usually, the information related to the crop types available in a given territory is annual information, that is, we only know the type of main crop grown over a year and we do not know any crops that have followed one another during the year and also we do not know when a particular crop is sown and when it is harvested. The main objective of this dataset is to create the basis for experimenting with suitable solutions to give a reliable answer to the above questions, or to propose models capable of producing dynamic segmentation maps that show when a crop begins to grow and when it is collected. Consequently, being able to understand if more than one crop has been grown in a territory within a year. In this dataset, we have 20 coverage classes as ground-truth values provided by Regine Lombardia. The mapping of the class labels used (see file lombardia-classes/classes25pc.txt) brings together some classes and provides the time intervals within which that category grows. The last two c
BG Vulnerable Pedestrian (BGVP) is a dataset to help train well-rounded models and thus induce research to increase the efficacy of vulnerable pedestrian detection. The dataset contains 2,000 images with 5,932 bounding box instances from four categories, i.e., Children Without Disability, Elderly without Disability, With Disability, and Non-Vulnerable.
TAS-NIR is a VIS+NIR dataset of semantically annotated images in unstructured outdoor environments. It consists of 209 VIS+NIR image pairs with a fine-grained semantic segmentation.
SPARF is a large-scale ShapeNet-based synthetic dataset for novel view synthesis consisting of ~17 million images rendered from nearly 40,000 shapes at high resolution (400×400 pixels).
Causal Triplet is a causal representation learning benchmark featuring not only visually more complex scenes, but also two crucial desiderata commonly overlooked in previous works:
VTC is a large-scale multimodal dataset containing video-caption pairs (~300k) alongside comments that can be used for multimodal representation learning.
FaceOcc is a high-quality face occlusion dataset which contains all mislabeled occlusions in CelebAMask-HQ and complements some occlusions and textures from the internet. The occlusion types cover sunglasses, spectacles, hands, masks, scarfs, microphones, etc.
DeePhy is a novel DeepFake Phylogeny dataset consisting of 5040 DeepFake videos generated using three different generation techniques. It is one of the first datasets which incorporates the concept of Deepfake Phylogeny which refers to the idea of generation of DeepFakes using multiple generation techniques in a sequential manner.
The temporal variability in calving front positions of marine-terminating glaciers permits inference on the frontal ablation. Frontal ablation, the sum of the calving rate and the melt rate at the terminus, significantly contributes to the mass balance of glaciers. Therefore, the glacier area has been declared as an Essential Climate Variable product by the World Meteorological Organization. The presented dataset provides the necessary information for training deep learning techniques to automate the process of calving front delineation. The dataset includes Synthetic Aperture Radar (SAR) images of seven glaciers distributed around the globe. Five of them are located in Antarctica: Crane, Dinsmoore-Bombardier-Edgeworth, Mapple, Jorum and the Sjörgen-Inlet Glacier. The remaining glaciers are the Jakobshavn Isbrae Glacier in Greenland and the Columbia Glacier in Alaska. Several images were taken for each glacier, forming a time series. The time series lie in the time span between 1995 an
Fetoscopic Placental Vessel Segmentation and Registration (FetReg2021) challenge was organized as part of the MICCAI2021 Endoscopic Vision (EndoVis) challenge. Through FetReg2021 challenge, we released the first large-scale multi-centre dataset of fetoscopy laser photocoagulation procedure. The dataset contains 2,718 pixel-wise annotated images (for background, vessel, fetus, tool classes) from 24 different in vivo TTTS fetoscopic surgeries and 24 unannotated video clips video clips containing 9,616 frames for training and testing. The dataset is useful for the development of generalized and robust semantic segmentation and video mosaicking algorithms for long duration fetoscopy videos.
AdvNet is a dataset of traffic signs images. Specifically, it includes adversarial traffic sign images (i.e., pictures of traffic signs with stickers on their surface) that can fool state-of-the-art neural network-based perception systems and clean traffic sign images without any stickers on them.
We provide multiple human annotations for each test image in Fashion-MNIST. This can be used as soft labels or probabilistic labels instead of the usual hard (single) labels.
This provides a benchmark for cyclist's orientation detection, "CIMAT-Cyclist" with bounding box based labels according to eight different classes depending on the orientation. Which contains 11, 103 images, of which 6,605 images were collected in approximately 450 videos and images taken from sports events and the streets of the state of Zacatecas, Mexico, while 4,498 additional images were obtained from the web in pages such as pixabay, pexels, freephotos, among others. "CIMAT-Cyclist" provide 20,229 instances over 11,103 cyclist's images, where 80% of the images were split for the training set and 20% for the test set.
IBL-NeRF Dataset. Contains multi-view images with its intrinsic components.
Differential fluorescent staining is an effective tool widely adopted for the visualization, segmentation and quantification of cells and cellular substructures as a part of standard microscopic imaging protocols. Incompatibility of staining agents with viable cells represents major and often inevitable limitations to its applicability in live experiments, requiring extraction of samples at different stages of experiment increasing laboratory costs. Accordingly, development of computerized image analysis methodology capable of segmentation and quantification of cells and cellular substructures from plain monochromatic images obtained by light microscopy without help of any physical markup techniques is of considerable interest. The enclosed set contains human colon adenocarcinoma Caco-2 cells microscopic images obtained under various imaging conditions with different viable vs non-viable cells fractions. Each field of view is provided in a three-fold representation, including phase-con
We applied our framework, dubbed as ”PreNeRF 360”, to enable the use of the Nutrition5k dataset in NeRF and introduce an updated version of this dataset, known as the N5k360 dataset.
HAMMER dataset contains 13 Scenes. Each scene has two setups, with/without objects (with : scene includes several objects with various surface material, without : scene with only backgrounds - naked) and each scene has two camera trajectories. Each trajectories composed with roughly 300 frames, which adds up to 16k frames in total (13 x 2 x 2 x 300). Each trajectory contains corresponding images from each cameras : d435 – stereo, l515 – Lidar (D-ToF), polarization – RGBP (RGB with polarization), tof – (I-ToF). Each camera folder contains its intrinsic file and its own recorded images together with rendered depth GT / instance GT and camera pose. All the cameras are fully synchronized via robotic arm’s data acquisition setup.