3,275 machine learning datasets
3,275 dataset results
The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects. Thus, there is large variation in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. The PubFig dataset is similar in spirit to the Labeled Faces in the Wild (LFW) dataset.
Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:
RISE is a large-scale video dataset for Recognizing Industrial Smoke Emissions. A citizen science approach was adopted to collaborate with local community members to annotate whether a video clip has smoke emissions. The dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons.
StereoMSI comprises of 350 registered colour-spectral image pairs. The dataset has been used for the two tracks of the PIRM2018 challenge.
Comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations.
Talk2Nav is a large-scale dataset with verbal navigation instructions.
A new text effects dataset with 141,081 text effect/glyph pairs in total. The dataset consists of 152 professionally designed text effects rendered on glyphs, including English letters, Chinese characters, and Arabic numerals.
40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments.
This dataset contains 2,000 images taken from inside a warehouse of the Energy Company of Paraná (Copel), which directly serves more than 4 million consuming units in the Brazilian state of Paraná.
The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.
A large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering.
Visual Beliefs is a dataset of abstract scenes to study visual beliefs. The dataset consists of 8-frame scenes, and in each scene a person has a mistaken belief. The dataset can be used for two tasks: predicting who is mistaken and predicting when are they mistaken.
RSOC is a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots.
Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset.
VQA 360° is a dataset for visual question answering on 360° images containing around 17,000 real-world image-question-answer triplets for a variety of question types.
PHSPD is a home-grown polarization image dataset of various human shapes and poses.
Multi Task Crowd is a new 100 image dataset fully annotated for crowd counting, violent behaviour detection and density level classification.
One of the first datasets (if not the first) to highlight the importance of bias and diversity in the community, which started a revolution afterwards. Introduced in 2014 as integral part of a thesis of Master of Science [1,2] at Carnegie Mellon and City University of Hong Kong. It was later expanded by adding synthetic images generated by a GAN architecture at ETH Zürich (in HDCGAN by Curtó et al. 2017). Being then not only the pioneer of talking about the importance of balanced datasets for learning and vision but also for being the first GAN augmented dataset of faces.
The Toulouse Road Network dataset describes patches of road maps from the city of Toulouse, represented both as spatial graphs G = (A, X) and as grayscale segmentation images.
FRGC-Morphs is a dataset of morphed faces selected from the publicly available FRGC dataset [1].