Datasets

3,275 machine learning datasets

3,275 dataset results

Cholec80 (Surgical Workflow Dataset)

Cholec80 is an endoscopic video dataset containing 80 videos of cholecystectomy surgeries performed by 13 surgeons. The videos are captured at 25 fps and downsampled to 1 fps for processing. The whole dataset is labeled with the phase and tool presence annotations. The phases have been defined by a senior surgeon in Strasbourg hospital, France. Since the tools are sometimes hardly visible in the images and thus difficult to be recognized visually, a tool is defined as present in an image if at least half of the tool tip is visible.

134 papers7 benchmarksImages, Medical, Videos

Replay-Attack

The Replay-Attack Database for face spoofing consists of 1300 video clips of photo and video attack attempts to 50 clients, under different lighting conditions. All videos are generated by either having a (real) client trying to access a laptop through a built-in webcam or by displaying a photo or a video recording of the same client for at least 9 seconds.

133 papers16 benchmarksImages, Videos

Virtual KITTI

Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation.

133 papers0 benchmarksImages, Videos

RobustBench

RobustBench is a benchmark of adversarial robustness, which as accurately as possible reflects the robustness of the considered models within a reasonable computational budget. To this end, we start by considering the image classification task and introduce restrictions (possibly loosened in the future) on the allowed models.

133 papers0 benchmarksImages

CORe50

CORe50 is a dataset designed for assessing Continual Learning techniques in an Object Recognition context.

132 papers0 benchmarksImages

Panoptic (CMU Panoptic Studio)

CMU Panoptic is a large scale dataset providing 3D pose annotations (1.5 millions) for multiple people engaging social activities. It contains 65 videos (5.5 hours) with multi-view annotations, but only 17 of them are in multi-person scenario and have the camera parameters.

131 papers10 benchmarksImages

VOT2018

VOT2018 is a dataset for visual object tracking. It consists of 60 challenging videos collected from real-life datasets.

129 papers4 benchmarksImages, Videos

CityPersons

The CityPersons dataset is a subset of Cityscapes which only consists of person annotations. There are 2975 images for training, 500 and 1575 images for validation and testing. The average of the number of pedestrians in an image is 7. The visible-region and full-body annotations are provided.

129 papers21 benchmarksImages

ORL (Our Database of Faces)

The ORL Database of Faces contains 400 images from 40 distinct subjects. For some subjects, the images were taken at different times, varying the lighting, facial expressions (open / closed eyes, smiling / not smiling) and facial details (glasses / no glasses). All the images were taken against a dark homogeneous background with the subjects in an upright, frontal position (with tolerance for some side movement). The size of each image is 92x112 pixels, with 256 grey levels per pixel.

129 papers1 benchmarksImages

Make3D

The Make3D dataset is a monocular Depth Estimation dataset that contains 400 single training RGB and depth map pairs, and 134 test samples. The RGB images have high resolution, while the depth maps are provided at low resolution.

129 papers6 benchmarksImages

LFPW (Labeled Face Parts in the Wild)

The Labeled Face Parts in-the-Wild (LFPW) consists of 1,432 faces from images downloaded from the web using simple text queries on sites such as google.com, flickr.com, and yahoo.com. Each image was labeled by three MTurk workers, and 29 fiducial points, shown below, are included in dataset.

128 papers0 benchmarksImages

Meta-Dataset

The Meta-Dataset benchmark is a large few-shot learning benchmark and consists of multiple datasets of different data distributions. It does not restrict few-shot tasks to have fixed ways and shots, thus representing a more realistic scenario. It consists of 10 datasets from diverse domains:

128 papers2 benchmarksImages

PASCAL VOC 2007

PASCAL VOC 2007 is a dataset for image recognition. The twenty object classes that have been selected are:

126 papers66 benchmarksImages

FBMS (Freiburg-Berkeley Motion Segmentation)

The Freiburg-Berkeley Motion Segmentation Dataset (FBMS-59) is an extension of the BMS dataset with 33 additional video sequences. A total of 720 frames is annotated. It has pixel-accurate segmentation annotations of moving objects. FBMS-59 comes with a split into a training set and a test set.

126 papers4 benchmarksImages, Videos

SUN3D

SUN3D contains a large-scale RGB-D video database, with 8 annotated sequences. Each frame has a semantic segmentation of the objects in the scene and information about the camera pose. It is composed by 415 sequences captured in 254 different spaces, in 41 different buildings. Moreover, some places have been captured multiple times at different moments of the day.

126 papers0 benchmarks3D, Images, Point cloud, RGB-D, Videos

LUNA

The LUNA challenges provide datasets for automatic nodule detection algorithms using the largest publicly available reference database of chest CT scans, the LIDC-IDRI data set. In LUNA16, participants develop their algorithm and upload their predictions on 888 CT scans in one of the two tracks: 1) the complete nodule detection track where a complete CAD system should be developed, or 2) the false positive reduction track where a provided set of nodule candidates should be classified.

125 papers4 benchmarksImages, Medical

FreiHAND

FreiHAND is a 3D hand pose dataset which records different hand actions performed by 32 people. For each hand image, MANO-based 3D hand pose annotations are provided. It currently contains 32,560 unique training samples and 3960 unique samples for evaluation. The training samples are recorded with a green screen background allowing for background removal. In addition, it applies three different post processing strategies to training samples for data augmentation. However, these post processing strategies are not applied to evaluation samples.

125 papers24 benchmarks3D, Images

Aff-Wild

Aff-Wild is a large-scale in-the-wild dataset for valence-arousal estimation from videos with a variety of head poses, illumination conditions and occlusions.

125 papers0 benchmarksImages

MSRA-TD500 (MSRA Text Detection 500 Database)

The MSRA-TD500 dataset is a text detection dataset that contains 300 training images and 200 test images. Text regions are arbitrarily orientated and annotated at sentence level. Different from the other datasets, it contains both English and Chinese text.

124 papers4 benchmarksImages, Texts

FER+ (Face Expression Recognition Plus dataset)

The FER+ dataset is an extension of the original FER dataset, where the images have been re-labelled into one of 8 emotion types: neutral, happiness, surprise, sadness, anger, disgust, fear, and contempt.

124 papers7 benchmarksImages

PreviousPage 12 of 164Next