Datasets

3,275 machine learning datasets

3,275 dataset results

Office-31 (Office Dataset)

The Office dataset contains 31 object categories in three domains: Amazon, DSLR and Webcam. The 31 categories in the dataset consist of objects commonly encountered in office settings, such as keyboards, file cabinets, and laptops. The Amazon domain contains on average 90 images per class and 2817 images in total. As these images were captured from a website of online merchants, they are captured against clean background and at a unified scale. The DSLR domain contains 498 low-noise high resolution images (4288×2848). There are 5 objects per category. Each object was captured from different viewpoints on average 3 times. For Webcam, the 795 images of low resolution (640×480) exhibit significant noise and color as well as white balance artifacts.

643 papers12 benchmarksImages

CheXpert

The CheXpert dataset contains 224,316 chest radiographs of 65,240 patients with both frontal and lateral views available. The task is to do automated chest x-ray interpretation, featuring uncertainty labels and radiologist-labeled reference standard evaluation sets.

628 papers6 benchmarksImages, Medical

iNaturalist

The iNaturalist 2017 dataset (iNat) contains 675,170 training and validation images from 5,089 natural fine-grained categories. Those categories belong to 13 super-categories including Plantae (Plant), Insecta (Insect), Aves (Bird), Mammalia (Mammal), and so on. The iNat dataset is highly imbalanced with dramatically different number of images per category. For example, the largest super-category “Plantae (Plant)” has 196,613 images from 2,101 categories; whereas the smallest super-category “Protozoa” only has 381 images from 4 categories.

603 papers9 benchmarksImages

ImageNet-C

ImageNet-C is an open source data set that consists of algorithmically generated corruptions (blur, noise) applied to the ImageNet test-set.

602 papers9 benchmarksImages

Urban100

The Urban100 dataset contains 100 images of urban scenes. It commonly used as a test set to evaluate the performance of super-resolution models. Image Source: http://vllab.ucmerced.edu/wlai24/LapSRN/

591 papers0 benchmarksImages

VoxCeleb2

VoxCeleb2 is a large scale speaker recognition dataset obtained automatically from open-source media. VoxCeleb2 consists of over a million utterances from over 6k speakers. Since the dataset is collected ‘in the wild’, the speech segments are corrupted with real world noise including laughter, cross-talk, channel effects, music and other sounds. The dataset is also multilingual, with speech from speakers of 145 different nationalities, covering a wide range of accents, ages, ethnicities and languages. The dataset is audio-visual, so is also useful for a number of other applications, for example – visual speech synthesis, speech separation, cross-modal transfer from face to voice or vice versa and training face recognition from video to complement existing face recognition datasets.

564 papers3 benchmarksAudio, Images, Texts, Videos

LVIS

LVIS is a dataset for long tail instance segmentation. It has annotations for over 1000 object categories in 164k images.

551 papers0 benchmarksImages

VGGFace2 (Vggface2: A dataset for recognising faces across pose and age)

VGGFace2 is a large-scale face recognition dataset. Images are downloaded from Google Image Search and have large variations in pose, age, illumination, ethnicity and profession. VGGFace2 contains images from identities spanning a wide range of different ethnicities, accents, professions and ages. All face images are captured "in the wild", with pose and emotion variations and different lighting and occlusion conditions. Face distribution for different identities is varied, from 87 to 843, with an average of 362 images for each subject.

539 papers2 benchmarksImages

SYNTHIA (SYNTHetic Collection of Imagery and Annotations)

The SYNTHIA dataset is a synthetic dataset that consists of 9400 multi-viewpoint photo-realistic frames rendered from a virtual city and comes with pixel-level semantic annotations for 13 classes. Each frame has resolution of 1280 × 960.

538 papers2 benchmarksImages

VGG-Face2 (Vggface2: A dataset for recognising faces across pose and age)

533 papers0 benchmarksImages

Set14

The Set14 dataset is a dataset consisting of 14 images commonly used for testing performance of Image Super-Resolution models. Image Source: https://www.ece.rice.edu/~wakin/images/

532 papers4 benchmarksImages

Places205

The Places205 dataset is a large-scale scene-centric dataset with 205 common scene categories. The training dataset contains around 2,500,000 images from these categories. In the training set, each scene category has the minimum 5,000 and maximum 15,000 images. The validation set contains 100 images per category (a total of 20,500 images), and the testing set includes 200 images per category (a total of 41,000 images).

525 papers1 benchmarksImages

FGVC-Aircraft

FGVC-Aircraft contains 10,200 images of aircraft, with 100 images for each of 102 different aircraft model variants, most of which are airplanes. The (main) aircraft in each image is annotated with a tight bounding box and a hierarchical airplane model label. Aircraft models are organized in a four-levels hierarchy. The four levels, from finer to coarser, are:

520 papers5 benchmarksImages

MPII (MPII Human Pose)

The MPII Human Pose Dataset for single person pose estimation is composed of about 25K images of which 15K are training samples, 3K are validation samples and 7K are testing samples (which labels are withheld by the authors). The images are taken from YouTube videos covering 410 different human activities and the poses are manually annotated with up to 16 body joints.

495 papers3 benchmarksImages

CIFAR-10C

Common corruptions dataset for CIFAR10

494 papers3 benchmarksImages

ImageNet-R (ImageNet-Rendition)

ImageNet-R(endition) contains art, cartoons, deviantart, graffiti, embroidery, graphics, origami, paintings, patterns, plastic objects, plush objects, sculptures, sketches, tattoos, toys, and video game renditions of ImageNet classes.

481 papers10 benchmarksImages

Waymo Open Dataset

The Waymo Open Dataset is comprised of high resolution sensor data collected by autonomous vehicles operated by the Waymo Driver in a wide variety of conditions.

481 papers47 benchmarks3D, Images, LiDAR

SUN RGB-D

The SUN RGBD dataset contains 10335 real RGB-D images of room scenes. Each RGB image has a corresponding depth and segmentation map. As many as 700 object categories are labeled. The training and testing sets contain 5285 and 5050 images, respectively.

477 papers26 benchmarksImages, Interactive, Point cloud, RGB-D

TextVQA

TextVQA is a dataset to benchmark visual reasoning based on text in images. TextVQA requires models to read and reason about text in images to answer questions about them. Specifically, models need to incorporate a new modality of text present in the images and reason over it to answer TextVQA questions.

476 papers1 benchmarksImages, Texts

USPS

USPS is a digit dataset automatically scanned from envelopes by the U.S. Postal Service containing a total of 9,298 16×16 pixel grayscale samples; the images are centered, normalized and show a broad range of font styles.

459 papers3 benchmarksImages

PreviousPage 3 of 164Next