Datasets

3,275 machine learning datasets

3,275 dataset results

MVTEC 3D-AD (THE MVTEC 3D ANOMALY DETECTION DATASET)

MVTec 3D Anomaly Detection Dataset (MVTec 3D-AD) is a comprehensive 3D dataset for the task of unsupervised anomaly detection and localization. It contains over 4000 high-resolution scans acquired by an industrial 3D sensor. Each of the 10 different object categories comprises a set of defect-free training and validation samples and a test set of samples with various kinds of defects. Precise ground-truth annotations are provided for each anomalous test sample.

45 papers5 benchmarksImages, Point cloud

RSTPReid (Real Scenario Text-based Person Re-identification)

RSTPReid contains 20505 images of 4,101 persons from 15 cameras. Each person has 5 corresponding images taken by different cameras with complex both indoor and outdoor scene transformations and backgrounds in various periods of time, which makes RSTPReid much more challenging and more adaptable to real scenarios. Each image is annotated with 2 textual descriptions. For data division, 3701 (index < 18505), 200 (18505 <= index < 19505) and 200 (index >= 19505) identities are utilized for training, validation and testing, respectively (Marked by item 'split' in the JSON file). Each sentence is no shorter than 23 words.

45 papers13 benchmarksImages, Texts

ScanNet200

The ScanNet200 benchmark studies 200-class 3D semantic segmentation - an order of magnitude more class categories than previous 3D scene understanding benchmarks. The source of scene data is identical to ScanNet, but parses a larger vocabulary for semantic and instance segmentation

45 papers18 benchmarks3D, 3d meshes, Images, RGB-D

BIWI

The dataset contains over 15K images of 20 people (6 females and 14 males - 4 people were recorded twice). For each frame, a depth image, the corresponding rgb image (both 640x480 pixels), and the annotation is provided. The head pose range covers about +-75 degrees yaw and +-60 degrees pitch. Ground truth is provided in the form of the 3D location of the head and its rotation.

44 papers21 benchmarksImages, Videos

SCUT-CTW1500

The SCUT-CTW1500 dataset contains 1,500 images: 1,000 for training and 500 for testing. In particular, it provides 10,751 cropped text instance images, including 3,530 with curved text. The images are manually harvested from the Internet, image libraries such as Google Open-Image, or phone cameras. The dataset contains a lot of horizontal and multi-oriented text.

44 papers6 benchmarksImages, Texts

PanoContext

The PanoContext dataset contains 500 annotated cuboid layouts of indoor environments such as bedrooms and living rooms.

44 papers5 benchmarks3D, Images

Oxford105k

Oxford105k is the combination of the Oxford5k dataset and 99782 negative images crawled from Flickr using 145 most popular tags. This dataset is used to evaluate search performance for object retrieval (reported as mAP) on a large scale.

44 papers0 benchmarksImages

UR-FUNNY

For understanding multimodal language used in expressing humor.

44 papers0 benchmarksAudio, Images, Texts

Occ3D

Occ3D is a dataset for 3D occupancy prediction, which aims to estimate the detailed occupancy and semantics of objects from multi-view images. To facilitate this task, a label generation pipeline that produces dense, visibility-aware labels for a given scene. This pipeline includes point cloud aggregation, point labeling, and occlusion handling.

44 papers0 benchmarksImages

COCO-O

COCO-O(ut-of-distribution) contains 6 domains (sketch, cartoon, painting, weather, handmake, tattoo) of COCO objects which are hard to be detected by most existing detectors. The dataset has a total of 6,782 images and 26,624 labelled bounding boxes.

44 papers10 benchmarksImages

Stacked MNIST

The Stacked MNIST dataset is derived from the standard MNIST dataset with an increased number of discrete modes. 240,000 RGB images in the size of 32×32 are synthesized by stacking three random digit images from MNIST along the color channel, resulting in 1,000 explicit modes in a uniform distribution corresponding to the number of possible triples of digits.

43 papers2 benchmarksImages

DensePASS

DensePASS - a novel densely annotated dataset for panoramic segmentation under cross-domain conditions, specifically built to study the Pinhole-to-Panoramic transfer and accompanied with pinhole camera training examples obtained from Cityscapes. DensePASS covers both, labelled- and unlabelled 360-degree images, with the labelled data comprising 19 classes which explicitly fit the categories available in the source domain (i.e. pinhole) data.

43 papers2 benchmarksImages

WebQA

WebQA, is a new benchmark for multimodal multihop reasoning in which systems are presented with the same style of data as humans when searching the web: Snippets and Images. The system must then identify which information is relevant across modalities and combine it with reasoning to answer the query. Systems will be evaluated on both the correctness of their answers and their sources.

43 papers0 benchmarksImages, Texts

Occluded-DukeMTMC

Occluded-DukeMTMC contains 15,618 training images, 17,661 gallery images, and 2,210 occluded query images. The experiment results on Occluded-DukeMTMC will demonstrate the superiority of our method in Occluded Person Re-ID problems, let alone that our method does not need any manually cropping procedure as pre-process.

43 papers2 benchmarksImages

PosePrior

Accurate modeling of priors over 3D human pose is fundamental to many problems in computer vision.

43 papers0 benchmarksImages

ImageNet-S (ImageNet Semantic Segmentation)

Powered by the ImageNet dataset, unsupervised learning on large-scale data has made significant advances for classification tasks. There are two major challenges to allowing such an attractive learning modality for segmentation tasks: i) a large-scale benchmark for assessing algorithms is missing; ii) unsupervised shape representation learning is difficult. We propose a new problem of large-scale unsupervised semantic segmentation (LUSS) with a newly created benchmark dataset to track the research progress. Based on the ImageNet dataset, we propose the ImageNet-S dataset with 1.2 million training images and 50k high-quality semantic segmentation annotations for evaluation. Our benchmark has a high data diversity and a clear task objective. We also present a simple yet effective baseline method that works surprisingly well for LUSS. In addition, we benchmark related un/weakly/fully supervised methods accordingly, identifying the challenges and possible directions of LUSS.

43 papers9 benchmarksImages

gRefCOCO

gRefCOCO is the first large-scale Generalized Referring Expression Segmentation dataset that contains multi-target, no-target, and single-target expressions.

43 papers6 benchmarksImages, Texts

CASIA-FASD

CASIA-FASD is a small face anti-spoofing dataset containing 50 subjects.

42 papers0 benchmarksImages

Oxford RobotCar Dataset

The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks.

42 papers3 benchmarksImages

E-commerce

We release E-commerce Dialogue Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of E-commerical Conversation Corpus are shown in the following table.

42 papers3 benchmarksImages, Texts

PreviousPage 25 of 164Next