Datasets

3,275 machine learning datasets

3,275 dataset results

AFLW-19 (The 19 landmark variant of AFLW.)

The original AFLW provides at most 21 points for each face, but excluding coordinates for invisible landmarks, causing difficulties for training most of the existing baseline approaches. To make fair comparisons, the authors manually annotate the coordinates of these invisible landmarks to enable training of those baseline approaches. The new annotation does not include two ear points because it is very difficult to decide the location of invisible ears. This causes the point number of AFLW-19 to be 19.

24 papers20 benchmarksImages

DADA-seg

DADA-seg is a pixel-wise annotated accident dataset, which contains a variety of critical scenarios from traffic accidents. It is used for semantic segmentation.

24 papers2 benchmarksImages

KITTI-STEP

The Segmenting and Tracking Every Pixel (STEP) benchmark consists of 21 training sequences and 29 test sequences. It is based on the KITTI Tracking Evaluation and the Multi-Object Tracking and Segmentation (MOTS) benchmark. This benchmark extends the annotations to the Segmenting and Tracking Every Pixel (STEP) task. [Copy-pasted from http://www.cvlibs.net/datasets/kitti/eval_step.php]

24 papers15 benchmarksImages

105,941 Images Natural Scenes OCR Data of 12 Languages

Description: 105,941 Images Natural Scenes OCR Data of 12 Languages. The data covers 12 languages (6 Asian languages, 6 European languages), multiple natural scenes, multiple photographic angles. For annotation, line-level quadrilateral bounding box annotation and transcription for the texts were annotated in the data. The data can be used for tasks such as OCR of multi-language.

24 papers0 benchmarksImages

REALY (Region-aware benchmark based on the LYHM)

The REALY benchmark aims to introduce a region-aware evaluation pipeline to measure the fine-grained normalized mean square error (NMSE) of 3D face reconstruction methods from under-controlled image sets.

24 papers25 benchmarks3D, 3d meshes, Images, RGB-D

MMSE-HR (Multimodal Spontaneous Expression-Heart Rate dataset)

The MMSE-HR benchmark consists of a dataset of 102 videos from 40 subjects recorded at 1040x1392 raw resolution at 25fps. During the recordings, various stimuli such as videos, sounds, and smells are introduced to induce different emotional states in the subjects. The ground truth waveform for MMSE-HR is the blood pressure signal sampled at 1000Hz. The dataset contains a diverse distribution of skin colors in the Fitzpatrick scale (II=8, III=11, IV=17, V+VI=4).

24 papers20 benchmarksImages, Medical

InterHuman

InterHuman is a multimodal dataset, named InterHuman. It consists of about 107M frames for diverse two-person interactions, with accurate skeletal motions and 16,756 natural language descriptions.

24 papers16 benchmarksImages, Texts

HarMeme

HarMeme is a benchmark dataset for hateful meme classification containing 3, 544 memes related to COVID-19 collected from the Internet

24 papers2 benchmarksImages

PASCAL Face

The PASCAL FACE dataset is a dataset for face detection and face recognition. It has a total of 851 images which are a subset of the PASCAL VOC and has a total of 1,341 annotations. These datasets contain only a few hundreds of images and have limited variations in face appearance.

23 papers6 benchmarksImages

PhysioNet Challenge 2012

The PhysioNet Challenge 2012 dataset is publicly available and contains the de-identified records of 8000 patients in Intensive Care Units (ICU). Each record consists of roughly 48 hours of multivariate time series data with up to 37 features recorded at various times from the patients during their stay such as respiratory rate, glucose etc.

23 papers17 benchmarksImages, Medical

CIFAR10-DVS

CIFAR10-DVS is an event-stream dataset for object classification. 10,000 frame-based images that come from CIFAR-10 dataset are converted into 10,000 event streams with an event-based sensor, whose resolution is 128×128 pixels. The dataset has an intermediate difficulty with 10 different classes. The repeated closed-loop smooth (RCLS) movement of frame-based images is adopted to implement the conversion. Due to the transformation, they produce rich local intensity changes in continuous time which are quantized by each pixel of the event-based camera.

23 papers2 benchmarksImages

DocUNet (Document Image Unwarping via a Stacked U-Net)

Various documents dataset. Each of the 65 documents includes scanned ground truth images, both hard and easy distorted photos, and document-centered cropped images.

23 papers3 benchmarksImages

ITOP (Invariant-Top View Dataset)

The ITOP dataset consists of 40K training and 10K testing depth images for each of the front-view and top-view tracks. This dataset contains depth images with 20 actors who perform 15 sequences each and is recorded by two Asus Xtion Pro cameras. The ground-truth of this dataset is the 3D coordinates of 15 body joints.

23 papers0 benchmarksImages

DeepWeeds

The DeepWeeds dataset consists of 17,509 images capturing eight different weed species native to Australia in situ with neighbouring flora.

23 papers0 benchmarksImages

UC Merced Land Use Dataset

This is a 21 class land use image dataset meant for research purposes.

23 papers1 benchmarksImages

IXI (IXI Brain Development Dataset)

IXI Dataset is a collection of 600 MR brain images from normal, healthy subjects. The MR image acquisition protocol for each subject includes:

23 papers24 benchmarks3D, Images, Medical

ShapeWorld

ShapeWorld is a new evaluation methodology and framework for multimodal deep learning models, with a focus on formal-semantic style generalization capabilities. In this framework, artificial data is automatically generated according to predefined specifications. This controlled data generation makes it possible to introduce previously unseen instance configurations during evaluation, which consequently require the system to recombine learned concepts in novel ways.

23 papers0 benchmarksImages, Texts

HPS (Human POSEitioning System Dataset)

HPS Dataset is a collection of 3D humans interacting with large 3D scenes (300-1000 $m^2$, up to 2500 $m^2$). The dataset contains images captured from a head-mounted camera coupled with the reference 3D pose and location of the person in a pre-scanned 3D scene. 7 people in 8 large scenes are captured performing activities such as exercising, reading, eating, lecturing, using a computer, making coffee, dancing. The dataset provides more than 300K synchronized RGB images coupled with the reference 3D pose and location.

23 papers0 benchmarks3D, Images, Point cloud

OpenImages-v6

OpenImages V6 is a large-scale dataset , consists of 9 million training images, 41,620 validation samples, and 125,456 test samples. It is a partially annotated dataset, with 9,600 trainable classes

23 papers6 benchmarksImages

HRSOD (High-Resolution Salient Object Detection)

There exist several datasets for saliency detection, but none of them is specifically designed for high-resolution salient object detection. High-Resolution Salient Object Detection (HRSOD) dataset, containing 1610 training images and 400 test images. The total 2010 images are collected from the website of Flickr with the license of all creative commons. Pixel-level ground truths are manually annotated by 40 subjects. The shortest edge of each image in HRSOD is more than 1200 pixels.

23 papers24 benchmarksImages

PreviousPage 35 of 164Next