Datasets

3,275 machine learning datasets

3,275 dataset results

Perceptual Similarity

Perceptual Similarity is a dataset of human perceptual similarity judgments.

The Set5 dataset is a dataset consisting of 5 images (“baby”, “bird”, “butterfly”, “head”, “woman”) commonly used for testing performance of Image Super-Resolution models. Image Source: http://people.rennes.inria.fr/Aline.Roumy/results/SR_BMVC12.html

444 papers1 benchmarksImages

RefCOCO

The RefCOCO dataset is a referring expression generation (REG) dataset used for tasks related to understanding natural language expressions that refer to specific objects in images. Here are the key details about RefCOCO:

439 papers4 benchmarksImages, Texts

ImageNet-A

The ImageNet-A dataset consists of real-world, unmodified, and naturally occurring examples that are misclassified by ResNet models.

431 papers10 benchmarksImages

CUHK03 (Chinese University of Hong Kong Re-identification)

The CUHK03 consists of 14,097 images of 1,467 different identities, where 6 campus cameras were deployed for image collection and each identity is captured by 2 campus cameras. This dataset provides two types of annotations, one by manually labelled bounding boxes and the other by bounding boxes produced by an automatic detector. The dataset also provides 20 random train/test splits in which 100 identities are selected for testing and the rest for training

419 papers4 benchmarksImages

CASIA-WebFace

The CASIA-WebFace dataset is used for face verification and face identification tasks. The dataset contains 494,414 face images of 10,575 real identities collected from the web.

415 papers0 benchmarksImages

Replica

The Replica Dataset is a dataset of high quality reconstructions of a variety of indoor spaces. Each reconstruction has clean dense geometry, high resolution and high dynamic range textures, glass and mirror surface information, planar segmentation as well as semantic class and instance segmentation.

414 papers9 benchmarks3D, Images

GTA5 (Grand Theft Auto 5)

The GTA5 dataset contains 24966 synthetic images with pixel level semantic annotation. The images have been rendered using the open-world video game Grand Theft Auto 5 and are all from the car perspective in the streets of American-style virtual cities. There are 19 semantic classes which are compatible with the ones of Cityscapes dataset.

412 papers0 benchmarksImages

MVTecAD (MVTEC ANOMALY DETECTION DATASET)

MVTec AD is a dataset for benchmarking anomaly detection methods with a focus on industrial inspection. It contains over 5000 high-resolution images divided into fifteen different object and texture categories. Each category comprises a set of defect-free training images and a test set of images with various kinds of defects as well as images without defects.

402 papers3 benchmarksImages

Caltech-256

Caltech-256 is an object recognition dataset containing 30,607 real-world images, of different sizes, spanning 257 classes (256 object classes and an additional clutter class). Each class is represented by at least 80 images. The dataset is a superset of the Caltech-101 dataset.

401 papers2 benchmarksImages

DeepFashion

DeepFashion is a dataset containing around 800K diverse fashion images with their rich annotations (46 categories, 1,000 descriptive attributes, bounding boxes and landmark information) ranging from well-posed product images to real-world-like consumer photos.

397 papers6 benchmarksImages

3DPW

The 3D Poses in the Wild dataset is the first dataset in the wild with accurate 3D poses for evaluation. While other datasets outdoors exist, they are all restricted to a small recording volume. 3DPW is the first one that includes video footage taken from a moving phone camera.

395 papers44 benchmarks3D, Images, Videos

GoPro

The GoPro dataset for deblurring consists of 3,214 blurred images with the size of 1,280×720 that are divided into 2,103 training images and 1,111 test images. The dataset consists of pairs of a realistic blurry image and the corresponding ground truth sharp image that are obtained by a high-speed camera.

390 papers33 benchmarksImages, Videos

Argoverse

Argoverse is a tracking benchmark with over 30K scenarios collected in Pittsburgh and Miami. Each scenario is a sequence of frames sampled at 10 HZ. Each sequence has an interesting object called “agent”, and the task is to predict the future locations of agents in a 3 seconds future horizon. The sequences are split into training, validation and test sets, which have 205,942, 39,472 and 78,143 sequences respectively. These splits have no geographical overlap.

386 papers22 benchmarksImages, LiDAR, Point cloud, Videos

GTSRB (German Traffic Sign Recognition Benchmark)

The German Traffic Sign Recognition Benchmark (GTSRB) contains 43 classes of traffic signs, split into 39,209 training images and 12,630 test images. The images have varying light conditions and rich backgrounds.

374 papers3 benchmarksImages

FaceForensics++

FaceForensics++ is a forensics dataset consisting of 1000 original video sequences that have been manipulated with four automated face manipulation methods: Deepfakes, Face2Face, FaceSwap and NeuralTextures. The data has been sourced from 977 youtube videos and all videos contain a trackable mostly frontal face without occlusions which enables automated tampering methods to generate realistic forgeries.

368 papers62 benchmarksImages, Videos

OK-VQA (Outside Knowledge Visual Question Answering)

Outside Knowledge Visual Question Answering (OK-VQA) includes more than 14,000 questions that require external knowledge to answer.

368 papers4 benchmarksImages, Texts

Visual Question Answering v2.0 (VQA v2.0)

Visual Question Answering (VQA) v2.0 is a dataset containing open-ended questions about images. These questions require an understanding of vision, language and commonsense knowledge to answer. It is the second version of the VQA dataset.

366 papers0 benchmarksImages, Texts

Conceptual Captions

Automatic image captioning is the task of producing a natural-language utterance (usually a sentence) that correctly reflects the visual content of an image. Up to this point, the resource most used for this task was the MS-COCO dataset, containing around 120,000 images and 5-way image-caption annotations (produced by paid annotators).

352 papers7 benchmarksImages, Texts

NUS-WIDE

The NUS-WIDE dataset contains 269,648 images with a total of 5,018 tags collected from Flickr. These images are manually annotated with 81 concepts, including objects and scenes.

348 papers3 benchmarksImages

PreviousPage 4 of 164Next