3,275 machine learning datasets
3,275 dataset results
ScanBank is a benchmark dataset for figure extraction from scanned electronic theses and dissertations containing 10 thousand scanned page images, manually labeled by humans as to the presence of the 3.3 thousand figures or tables found therein.
Global WHEAT Dataset 2021 is the extentions of the Global Wheat Dataset 2020. It is the first large-scale dataset for wheat head detection from field optical images. It included a very large range of cultivars from differents continents. Wheat is a staple crop grown all over the world and consequently interest in wheat phenotyping spans the globe. Therefore, it is important that models developed for wheat phenotyping, such as wheat head detection networks, generalize between different growing environments around the world.
RaidaR is a rich annotated image dataset of rainy street scenes. RaidaR consists of 58,542 real rainy images containing several rain-induced artifacts: fog, droplets, road reflections, etc. 5,000/3,658 images were carefully semantic/instance segmentated, respectively.
CalCROP21 is a georeferenced multi-spectral dataset of satellite Imagery and crop labels. It is a semantic segmentation benchmark dataset, for the diverse crops in the Central Valley region of California at 10m spatial resolution using a Google Earth Engine based robust image processing pipeline.
This is the dataset for the CGF 2021 paper "DONeRF: Towards Real-Time Rendering of Compact Neural Radiance Fields using Depth Oracle Networks".
Human keypoint dataset of anime/manga-style character illustrations. Extension of the AnimeDrawingsDataset, with additional features:
5 domains: synthetic domain, document domain, street view domain, handwritten domain, and car license domain over five million images
Large-scale shadows from buildings in a city play an important role in determining the environmental quality of public spaces. They can be both beneficial, such as for pedestrians during summer, and detrimental, by impacting vegetation and by blocking direct sunlight. Determining the effects of shadows requires the accumulation of shadows over time across different periods in a year. In our paper Shadow Accrual Maps: Efficient Accumulation of City-Scale Shadows over Time, we present a simple yet efficient class of approach that uses the properties of sun movement to track the changing position of shadows within a fixed time interval. This repository presents the computed shadow information for New York City, Chicago, Los Angeles, Boston and Washington DC.
Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.
Fashion-MNT is large-scale bilingual product description dataset called Fashion-MMT, which contains over 114k noisy and 40k manually cleaned description translations with multiple product images.
MuCo-VQA consist of large-scale (3.7M) multilingual and code-mixed VQA datasets in multiple languages: Hindi (hi), Bengali (bn), Spanish (es), German (de), French (fr) and code-mixed language pairs: en-hi, en-bn, en-fr, en-de and en-es.
Saint Gall dataset contains handwritten historical manuscripts written in Latin that date back to the 9th century. It consists of 60 pages, 1 410 text lines and 11 597 words.
MuViHand is a dataset for 3D Hand Pose Estimation that consists of multi-view videos of the hand along with ground-truth 3D pose labels. The dataset includes more than 402,000 synthetic hand images available in 4,560 videos. The videos have been simultaneously captured from six different angles with complex backgrounds and random levels of dynamic lighting. The data has been captured from 10 distinct animated subjects using 12 cameras in a semi-circle topology.
A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.
OpenViDial 2.0 is a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context.
Mila Simulated Floods Dataset is a 1.5 square km virtual world using the Unity3D game engine including urban, suburban and rural areas.
EDFace-Celeb-1M is a public Ethnically Diverse Face dataset which is used to benchmark the task of face hallucination. The dataset includes 1.7 million photos that cover different countries, with balanced race composition.
Web services are often protected with a challenge that's supposed to be easy for people to solve, but difficult for computers. Such a challenge is often called a CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) or HIP (Human Interactive Proof). HIPs are used for many purposes, such as to reduce email and blog spam and prevent brute-force attacks on web site passwords.
325 word images intended for font recognition, whose fonts are included in VFR-447 (and VFR-2420).
Automated measurement of fetal head circumference using 2D ultrasound images