3,275 machine learning datasets
3,275 dataset results
HICRD (Heron Island Coral Reef Dataset) is a large-scale real underwater image dataset for underwater image restoration. There are 2000 reference restored images and 6003 original underwater images in the unpaired training set.
Indian Masked faces in the wild Database is collected into three sets:(i) Indian Celebrity, (ii) Instagram and (iii) Indian Crowd. The Indian Celebrity contains 40 Indian celebrities with 435 images, including Bollywood actors/actresses, television stars, sports personalities, and politicians. The Instagram set contains 377 images of 40 subjects downloaded from Instagram. We collected masked and non-masked images of Indian people with a public profile. The Indian Crowd set is collected from the common people who volunteered to contribute to the dataset. This set contains 120 subjects with 562 images. All the Images are collected in both constrained and unconstrained environments with variation in pose, illumination, background and masks worn by the people.
Photozilla is a large-scale dataset which includes over 990k images belonging to 10 different photographic styles. The dataset can be used to train classification models to automatically classify the images into the relevant style.
The synethetic dataset (10000 pairs of images and region, 2.95GB) is shared with the code (hdf5 dataset format).
The proposed Extended-YouTube Faces (E-YTF) is an extension of the famous YouTube Faces (YTF) dataset and is specifically designed to further push the challenges of face recognition by addressing the problem of open-set face identification from heterogeneous data i.e. still images vs video.
This is a dataset to benchmark real-time embedded object detection models for RoboCup SSL (Small Size League).
Cosmic rays in the LCO CR dataset are labeled accurately and consistently across many diverse observations from various instruments. To the best of our knowledge, this is the largest dataset of its kind. It consists of over 4,500 scientific images from Las Cumbres Observatory global telescope network's 23 instruments. Each sample in our dataset is a multi-extension FITS file, including three images, three corresponding CR masks, and three ignore masks.
BrazilDAM is a multi sensor and multitemporal dataset that consists of multispectral images of ore tailings dams throughout Brazil. Landsat 8 and Sentinel 2 satellites that capture multispectral images over the years 2016, 2017, 2018 and 2019 were used. The dataset contains samples collected in different regions, which increases the diversity and representativeness of the characteristics of the dams.
SinGAN-Seg-polyps is a synthetic dataset for polyp segmentation consisting of 10,000 synthetic polyps and masks.
The ObMan-Ego is a large-scale synthetic hand dataset with egocentric scenes in which the simulated hands are provided by ObMan. The dataset is used for a hand segmentation task and its sim-to-real adaptation benchmark. Training, validation, and testing sets contain 150, 000, 6, 500, and 6, 500 images, respectively.
8 kinds of weld defects
The HumanoidRobotPose dataset is a dataset for real-time pose estimation of humanoid robots.
Disaster is a dataset that contains images collected from various sources for three different disasters: fire, water and land. Besides this, it also contains images for various damaged infrastructure due to natural or man made calamities and damaged human due to war or accidents.
DeCost, Hecht, Francis, Webler, Picard, and Holm. UHCSDB (Ultrahigh Carbon Steel micrograph DataBase): tools for exploring large heterogeneous microstructure datasets. accepted for publication in IMMI 2017 doi: 10.1007/s40192-017-0097-0
The Iranis Dataset is a Large-scale dataset of Farsi license plate characters containing a large-scale dataset with more than 83,000 images of Farsi numbers and letters collected from real-world license plate images captured by various cameras.
TLFM dataset structured in sequences of at least nine timesteps. The dataset includes 9696 images of both brightfield and green fluorescent protein channels at a resolution of 256 × 256. Dataset for multi-domain (BF and GFP) microscopy image sequence generation.
Revision: v1.0.0-full-20210527a DOI: 10.5281/zenodo.4817662 Authors: J. Chazalon, E. Carlinet, Y. Chen, J. Perret, C. Mallet, B. Duménieu and T. Géraud Official competition website: https://icdar21-mapseg.github.io/
SaRNet is a single class dataset consisting of tiles of satellite imagery labeled with potential 'targets'. Labelers were instructed to draw boxes around anything they suspect may a paraglider wing, missing in a remote area of Nevada. Volunteers were shown examples of similar objects already in the environment for comparison.
BH-rPPG dataset (stands for Beihang University Remote PhotoPlethysmoGraphy) is a dataset consists of 3 lighting conditions with uneven distribution which collected in indoor environment. In order to evaluate the performance of deep learning based rPPG under different lighting conditions, we recruited twelve healthy subjects (11 males and 1 females) on campus, with a mean age of 32, SD of 2.5.
SyDog is a synthetic dataset of dogs containing ground truth pose and bounding box coordinates which was generated using the game engine, Unity.