3,275 machine learning datasets
3,275 dataset results
Chart2Text is a dataset that was crawled from 23,382 freely accessible pages from statista.com in early March of 2020, yielding a total of 8,305 charts, and associated summaries. For each chart, the chart image, the underlying data table, the title, the axis labels, and a human-written summary describing the statistic was downloaded.
2-PM Vessel is an open-source volumetric brain vasculature dataset obtained with two-photon microscopy at Focused Ultrasound Lab, at Sunnybrook Research Institute (affiliated with University of Toronto by Dr. Alison Burgess, Charissa Poon and Marc Santos. The dataset contains a total of 12 volumetric stacks consisting of images of mouse brain vasculature and tumour vasculature.
CubiCasa5K is a large-scale floorplan image dataset containing 5000 samples annotated into over 80 floorplan object categories. The dataset annotations are performed in a dense and versatile manner by using polygons for separating the different objects.
DAWN emphasizes a diverse traffic environment (urban, highway and freeway) as well as a rich variety of traffic flow. The DAWN dataset comprises a collection of 1000 images from real-traffic environments, which are divided into four sets of weather conditions: fog, snow, rain and sandstorms. The dataset is annotated with object bounding boxes for autonomous driving and video surveillance scenarios. This data helps interpreting effects caused by the adverse weather conditions on the performance of vehicle detection systems.
DeepScores contains high quality images of musical scores, partitioned into 300,000 sheets of written music that contain symbols of different shapes and sizes. For advancing the state-of-the-art in small objects recognition, and by placing the question of object recognition in the context of scene understanding.
A new dataset of handwritten text with fine-grained annotations at the character level and report results from an initial user evaluation.
Kuzushiji-49 is an MNIST-like dataset that has 49 classes (28x28 grayscale, 270,912 images) from 48 Hiragana characters and one Hiragana iteration mark.
LAD (Large-scale Attribute Dataset) has 78,017 images of 5 super-classes and 230 classes. The image number of LAD is larger than the sum of the four most popular attribute datasets (AwA, CUB, aP/aY and SUN). 359 attributes of visual, semantic and subjective properties are defined and annotated in instance-level.
Annotated using images taken by a drone in 501 separate flights, totalling in over 62 hours of trajectory data. As of today, openDD is by far the largest publicly available trajectory dataset recorded from a drone perspective, while comparable datasets span 17 hours at most.
xR-EgoPose is an egocentric synthetic dataset for egocentric 3D human pose estimation. It consists of ~380 thousand photo-realistic egocentric camera images in a variety of indoor and outdoor spaces.
FERET-Morphs is a dataset of morphed faces selected from the publicly available FERET dataset 1.
The Caltech Mouse Social Interactions (CalMS21) dataset is a multi-agent dataset from behavioral neuroscience. The dataset consists of trajectory data of social interactions, recorded from videos of freely behaving mice in a standard resident-intruder assay. The CalMS21 dataset is part of the Multi-Agent Behavior Challenge 2021.
We release expert-made scribble annotations for the medical ACDC dataset 1. The released data must be considered as extending the original ACDC dataset. The ACDC dataset contains cardiac MRI images, paired with hand-made segmentation masks. It is possible to use the segmentation masks provided in the ACDC dataset to evaluate the performance of methods trained using only scribble supervision.
e-ViL is a benchmark for explainable vision-language tasks. e-ViL spans across three datasets of human-written NLEs (natural language explanations), and provides a unified evaluation framework that is designed to be re-usable for future works.
ATD-12K is a large-scale animation triplet dataset, which comprises 12,000 triplets(train10k,test2k) by manually inspect and the test2k with rich annotations, including levels of difficulty, the Regions of Interest (RoIs) on movements, and tags on motion categories
PointQA is a set of datasets for Visual Question Datasets (VQA) that require a pointer to an object in the image to be answered correctly. The different datasets are: PointQA-Local, PointQA-LookTwice and PointQA-General.
TMED is a clinically-motivated benchmark dataset for computer vision and machine learning from limited labeled data.
ReaSCAN is a synthetic navigation task that requires models to reason about surroundings over syntactically difficult languages.
The REFLACX dataset contains eye-tracking data for 3,032 readings of chest x-rays by five radiologists. The dictated reports were transcribed and have timestamps synchronized with the eye-tracking data.
SpaceNet 2: Building Detection v2 - is a dataset for building footprint detection in geographically diverse settings from very high resolution satellite images. It contains over 302,701 building footprints, 3/8-band Worldview-3 satellite imagery at 0.3m pixel res., across 5 cities (Rio de Janeiro, Las Vegas, Paris, Shanghai, Khartoum), and covers areas that are both urban and suburban in nature. The dataset was split using 60%/20%/20% for train/test/validation.