Datasets

3,275 machine learning datasets

3,275 dataset results

OULU-NPU

The Oulu-NPU face presentation attack detection database consists of 4950 real access and attack videos. These videos were recorded using the front cameras of six mobile devices (Samsung Galaxy S6 edge, HTC Desire EYE, MEIZU X5, ASUS Zenfone Selfie, Sony XPERIA C5 Ultra Dual and OPPO N3) in three sessions with different illumination conditions and background scenes. The presentation attack types considered in the OULU-NPU database are print and video-replay. The 2D face artefacts were created using two printers and two display devices.

8 papers16 benchmarksImages, Videos

FloorPlanCAD

FloorPlanCAD is a large-scale real-world CAD drawing dataset containing over 15,000 floor plans, ranging from residential to commercial buildings.

8 papers0 benchmarksCad, Images

ROBUST-MIS (Robust Medical Instrument Segmentation Challenge 2019)

The ROBUST-MIS dataset was made available to support the Robust Medical Instrument Segmentation (ROBUST-MIS) Challenge 2019, part of the Endoscopic Vision Challenge associated with MICCAI.

8 papers3 benchmarksImages, Videos

ZJU-RGB-P

Research on semantic segmentation of traffic scenes using color and polarization information (including training and testing sets).

8 papers4 benchmarksImages

COUCH

COUCH is a large human-chair interaction dataset with clean annotations. The dataset consists of 3 hours and over 500 sequences of motion capture (MoCap) on human-chair interactions.

8 papers0 benchmarksImages

Echonet-Dynamic

Echocardiography, or cardiac ultrasound, is the most widely used and readily available imaging modality to assess cardiac function and structure. Combining portable instrumentation, rapid image acquisition, high temporal resolution, and without the risks of ionizing radiation, echocardiography is one of the most frequently utilized imaging studies in the United States and serves as the backbone of cardiovascular imaging. For diseases ranging from heart failure to valvular heart diseases, echocardiography is both necessary and sufﬁcient to diagnose many cardiovascular diseases. In addition to our deep learning model, we introduce a new large video dataset of echocardiograms for computer vision research. The EchoNet-Dynamic database includes 10,030 labeled echocardiogram videos and human expert annotations (measurements, tracings, and calculations) to provide a baseline to study cardiac motion and chamber sizes.

8 papers0 benchmarksImages

GPA (Geometric Pose Affordance)

multi-view imagery of people interacting with a variety of rich 3D environments

8 papers0 benchmarksImages

WorldStrat (The WorldStrat Dataset: Open High-Resolution Satellite Imagery With Paired Multi-Temporal Low-Resolution)

Nearly 10,000 km² of free high-resolution and paired multi-temporal low-resolution satellite imagery of unique locations which ensure stratified representation of all types of land-use across the world: from agriculture to ice caps, from forests to multiple urbanization densities.

8 papers0 benchmarksImages, Time series

University of Waterloo skin cancer database

The dataset is maintained by VISION AND IMAGE PROCESSING LAB, University of Waterloo. The images of the dataset were extracted from the public databases DermIS and DermQuest, along with manual segmentations of the lesions.

8 papers4 benchmarksImages

NCT-CRC-HE-100K

The NCT-CRC-HE-100K dataset is a set of 100,000 non-overlapping image patches extracted from 86 H$\&$E stained human cancer tissue slides and normal tissue from the NCT biobank (National Center for Tumor Diseases) and the UMM pathology archive (University Medical Center Mannheim). While the dataset Colorectal Cacner-Validation-Histology-7K (CRC-VAL-HE-7K) consist of 7180 images extracted from 50 patients with colorectal adenocarcinoma and were used to create a dataset that does not overlap with patients in the NCT-CRC-HE-100K dataset. It was created by pathologists by manually delineating tissue regions in whole slide images into the following nine tissue classes: Adipose (ADI), background (BACK), debris (DEB), lymphocytes (LYM), mucus (MUC), smooth muscle (MUS), normal colon mucosa (NORM), cancer-associated stroma (STR), colorectal adenocarcinoma epithelium (TUM).

8 papers9 benchmarksImages

CLEVR-Math

CLEVR-Math is a multi-modal math word problems dataset consisting of simple math word problems involving addition/subtraction, represented partly by a textual description and partly by an image illustrating the scenario. These word problems requires a combination of language, visual and mathematical reasoning.

8 papers0 benchmarksImages, Texts

H3WB (Human 3.6M 3D WholeBody)

Human3.6M 3D WholeBody (H3WB) is a large scale dataset with 133 whole-body keypoint annotations on 100K images, made possible by a new multi-view pipeline. It is designed for the three new tasks : i) 3D whole-body pose lifting from 2D complete whole-body pose, ii) 3D whole-body pose lifting from 2D incomplete whole-body pose, iii) 3D whole-body pose estimation from a single RGB image.

8 papers15 benchmarksImages

Duke Breast Cancer MRI (Dynamic contrast-enhanced magnetic resonance images of breast cancer patients with tumor locations)

Breast MRI scans of 922 cancer patients from Duke University, with tumor bounding box annotations, clinical, imaging, and many other features, and more.

8 papers0 benchmarksImages, MRI

VIGOR

Similar to CVUSA and CVACT, the VIGOR dataset contains satellites and street imagery to match them to each other to find the location of the street imagery. For this purpose, data from 4 major American cities were used, namely San Francisco, New York, Seattle and Chicago. Unlike the previous datasets, there are two settings: The SAME-Area setting where images of all cities are available in training and validation split. Secondly, there is the CROSS area setting where training is done on two cities (New York, Seattle) and evaluation is done on Chicago and San Francisco. In addition, the dataset contains semi-positive images which are very close to an actual ground truth image and thus serve as a distraction for the matching task. In total, the dataset consists of 90,618 satellite images and 105,214 street images.

8 papers0 benchmarksImages

OPRA (Online Product Reviews for Affordances)

The OPRA Dataset was introduced in Demo2Vec: Reasoning Object Affordances From Online Videos (CVPR'18) for reasoning object affordances from online demonstration videos. It contains 11,505 demonstration clips and 2,512 object images scraped from 6 popular YouTube product review channels along with the corresponding affordance annotations. More details can be found on our https://sites.google.com/view/demo2vec/.

8 papers2 benchmarksImages, Videos

WebUI

The WebUI dataset contains 400K web UIs captured over a period of 3 months and cost about $500 to crawl. We grouped web pages together by their domain name, then generated training (70%), validation (10%), and testing (20%) splits. This ensured that similar pages from the same website must appear in the same split. We created four versions of the training dataset. Three of these splits were generated by randomly sampling a subset of the training split: Web-7k, Web-70k, Web-350k. We chose 70k as a baseline size, since it is approximately the size of existing UI datasets. We also generated an additional split (Web-7k-Resampled) to provide a small, higher quality split for experimentation. Web-7k-Resampled was generated using a class-balancing sampling technique, and we removed screens with possible visual defects (e.g., very small, occluded, or invisible elements). The validation and test split was always kept the same.

8 papers0 benchmarksImages, Texts

SciGraphQA

SciGraphQA is a large-scale, open-domain dataset focused on generating multi-turn conversational question-answering dialogues centered around understanding and describing scientific graphs and figures. It contains over 300,000 samples derived from academic research papers in computer science and machine learning domains.

8 papers0 benchmarksImages, Texts

SI-HDR (Single-image high dynamic range dataset)

The dataset consists of 181 HDR images. Each image includes: 1) a RAW exposure stack, 2) an HDR image, 3) simulated camera images at two different exposures 4) Results of 6 single-image HDR reconstruction methods: Endo et al. 2017, Eilertsen et al. 2017, Marnerides et al. 2018, Lee et al. 2018, Liu et al. 2020, and Santos et al. 2020

8 papers0 benchmarksImages

GQA-REX

A GQA-based dataset with 1,040,830 multi-modal explanations of visual reasoning processes.

8 papers24 benchmarksImages, Texts

COCO-MIG (COCO-MIG benchmark)

The COCO-MIG benchmark (Common Objects in Context Multi-Instance Generation) is a benchmark used to evaluate the generation capability of generators on text containing multiple attributes of multi-instance objects. This benchmark consists of 800 sets of examples sampled from the COCO dataset. Following the layout of the COCO dataset, each instance is assigned random color information, and corresponding global image descriptions are constructed according to templates. The COCO-MIG also provides a complete pipeline for resampling and evaluating. For relevant tools and specific details, please refer to our project's homepage.

8 papers8 benchmarksImages, Texts

PreviousPage 57 of 164Next