Datasets

3,275 machine learning datasets

3,275 dataset results

Taskonomy

Taskonomy provides a large and high-quality dataset of varied indoor scenes.

aPY (Attribute Pascal and Yahoo)

aPY is a coarse-grained dataset composed of 15339 images from 3 broad categories (animals, objects and vehicles), further divided into a total of 32 subcategories (aeroplane, …, zebra).

147 papers5 benchmarksImages

The 2D-3D-S dataset provides a variety of mutually registered modalities from 2D, 2.5D and 3D domains, with instance-level semantic and geometric annotations. It covers over 6,000 m2 collected in 6 large-scale indoor areas that originate from 3 different buildings. It contains over 70,000 RGB images, along with the corresponding depths, surface normals, semantic annotations, global XYZ images (all in forms of both regular and 360° equirectangular images) as well as camera information. It also includes registered raw and semantically annotated 3D meshes and point clouds. The dataset enables development of joint and cross-modal learning models and potentially unsupervised approaches utilizing the regularities present in large-scale indoor spaces.

147 papers6 benchmarksImages

SumMe

The SumMe dataset is a video summarization dataset consisting of 25 videos, each annotated with at least 15 human summaries (390 in total).

146 papers14 benchmarksAudio, Images, Videos

STARE (Structured Analysis of the Retina)

The STARE (Structured Analysis of the Retina) dataset is a dataset for retinal vessel segmentation. It contains 20 equal-sized (700×605) color fundus images. For each image, two groups of annotations are provided..

146 papers28 benchmarksImages, Medical

VQA-RAD (Visual Question Answering in Radiology)

VQA-RAD consists of 3,515 question–answer pairs on 315 radiology images.

145 papers0 benchmarksImages, Medical, Texts

Set12

Set12 is a collection of 12 grayscale images of different scenes that are widely used for evaluation of image denoising methods. The size of each image is 256×256.

145 papers0 benchmarksImages

fMoW (Functional Map of the World)

Functional Map of the World (fMoW) is a dataset that aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features.

144 papers0 benchmarksImages

MPII Human Pose

MPII Human Pose Dataset is a dataset for human pose estimation. It consists of around 25k images extracted from online videos. Each image contains one or more people, with over 40k people annotated in total. Among the 40k samples, ∼28k samples are for training and the remainder are for testing. Overall the dataset covers 410 human activities and each image is provided with an activity label. Images were extracted from a YouTube video and provided with preceding and following un-annotated frames.

143 papers3 benchmarksImages, Videos

NABirds (North America Birds)

NABirds V1 is a collection of 48,000 annotated photographs of the 400 species of birds that are commonly observed in North America. More than 100 photographs are available for each species, including separate annotations for males, females and juveniles that comprise 700 visual categories. This dataset is to be used for fine-grained visual categorization experiments.

143 papers2 benchmarksImages

Pix3D

The Pix3D dataset is a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc.

142 papers15 benchmarks3D, Images

Flickr30K Entities

The Flickr30K Entities dataset is an extension to the Flickr30K dataset. It augments the original 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. This is used to define a new benchmark for localization of textual entity mentions in an image.

142 papers0 benchmarksImages, Texts

CBSD68 (Color BSD68)

Color BSD68 dataset for image denoising benchmarks is part of The Berkeley Segmentation Dataset and Benchmark. It is used for measuring image denoising algorithms performance. It contains 68 images.

142 papers0 benchmarksImages

Aff-Wild2

Aff-Wild2 is a large-scale in-the-wild database and an extension of the Aff-Wild dataset for affect recognition. It approximately doubles the number of included video frames and the number of subjects; thus, improving the variability of the included behaviors and of the involved persons. It is the only existing in-the-wild database with annotations for all 3 main behaviour tasks.

142 papers13 benchmarksImages

CAMO (Camouflaged Object)

Camouflaged Object (CAMO) dataset specifically designed for the task of camouflaged object segmentation. We focus on two categories, i.e., naturally camouflaged objects and artificially camouflaged objects, which usually correspond to animals and humans in the real world, respectively. Camouflaged object images consists of 1250 images (1000 images for the training set and 250 images for the testing set). Non-camouflaged object images are collected from the MS-COCO dataset (1000 images for the training set and 250 images for the testing set). CAMO has objectness mask ground-truth.

139 papers42 benchmarksImages

FC100 (Fewshot-CIFAR100)

The FC100 dataset (Fewshot-CIFAR100) is a newly split dataset based on CIFAR-100 for few-shot learning. It contains 20 high-level categories which are divided into 12, 4, 4 categories for training, validation and test. There are 60, 20, 20 low-level classes in the corresponding split containing 600 images of size 32 × 32 per class. Smaller image size makes it more challenging for few-shot learning.

137 papers0 benchmarksImages

Oxford5k (Oxford Buildings)

Oxford5K is the Oxford Buildings Dataset, which contains 5062 images collected from Flickr. It offers a set of 55 queries for 11 landmark buildings, five for each landmark.

137 papers1 benchmarksImages

SEED-Bench

SEED-Bench consists of 19K multiple choice questions with accurate human annotations (~6 larger than existing benchmarks), which spans 12 evaluation dimensions including the comprehension of both the image and video modality.

137 papers0 benchmarksImages, Videos

VIPeR (Viewpoint Invariant Pedestrian Recognition)

The Viewpoint Invariant Pedestrian Recognition (VIPeR) dataset includes 632 people and two outdoor cameras under different viewpoints and light conditions. Each person has one image per camera and each image has been scaled to be 128×48 pixels. It provides the pose angle of each person as 0° (front), 45°, 90° (right), 135°, and 180° (back).

134 papers0 benchmarksImages

VehicleID (PKU VehicleID)

The “VehicleID” dataset contains CARS captured during the daytime by multiple real-world surveillance cameras distributed in a small city in China. There are 26,267 vehicles (221,763 images in total) in the entire dataset. Each image is attached with an id label corresponding to its identity in real world. In addition, the dataset contains manually labelled 10319 vehicles (90196 images in total) of their vehicle model information(i.e.“MINI-cooper”, “Audi A6L” and “BWM 1 Series”).

134 papers2 benchmarksImages

PreviousPage 11 of 164Next

Datasets

Taskonomy

aPY (Attribute Pascal and Yahoo)

2D-3D-S (2D-3D-Semantic)

SumMe

STARE (Structured Analysis of the Retina)

VQA-RAD (Visual Question Answering in Radiology)

Set12

fMoW (Functional Map of the World)

MPII Human Pose

NABirds (North America Birds)

Pix3D

Flickr30K Entities

CBSD68 (Color BSD68)

Aff-Wild2

CAMO (Camouflaged Object)

FC100 (Fewshot-CIFAR100)

Oxford5k (Oxford Buildings)

SEED-Bench

VIPeR (Viewpoint Invariant Pedestrian Recognition)

VehicleID (PKU VehicleID)

Datasets

Taskonomy

aPY (Attribute Pascal and Yahoo)

2D-3D-S (2D-3D-Semantic)

SumMe

STARE (Structured Analysis of the Retina)

VQA-RAD (Visual Question Answering in Radiology)

Set12

fMoW (Functional Map of the World)

MPII Human Pose

NABirds (North America Birds)

Pix3D

Flickr30K Entities

CBSD68 (Color BSD68)

Aff-Wild2

CAMO (Camouflaged Object)

FC100 (Fewshot-CIFAR100)

Oxford5k (Oxford Buildings)

SEED-Bench

VIPeR (Viewpoint Invariant Pedestrian Recognition)

VehicleID (PKU VehicleID)