Datasets

3,275 machine learning datasets

3,275 dataset results

Places-LT

Places-LT has an imbalanced training set with 62,500 images for 365 classes from Places-2. The class frequencies follow a natural power law distribution with a maximum number of 4,980 images per class and a minimum number of 5 images per class. The validation and testing sets are balanced and contain 20 and 100 images per class respectively.

80 papers10 benchmarksImages

ReferItGame

The ReferIt dataset contains 130,525 expressions for referring to 96,654 objects in 19,894 images of natural scenes.

80 papers0 benchmarksImages, Texts

VQG (Visual Question Generation)

VQG is a collection of datasets for visual question generation. VQG questions were collected by crowdsourcing the task on Amazon Mechanical Turk (AMT). The authors provided details on the prompt and the specific instructions for all the crowdsourcing tasks in this paper in the supplementary material. The prompt was successful at capturing nonliteral questions. Images were taken from the MSCOCO dataset.

80 papers0 benchmarksAudio, Images, Texts

VeRi-776

VeRi-776 is a vehicle re-identification dataset which contains 49,357 images of 776 vehicles from 20 cameras. The dataset is collected in the real traffic scenario, which is close to the setting of CityFlow. The dataset contains bounding boxes, types, colors and brands.

79 papers12 benchmarksImages

WIT (Wikipedia-based Image Text)

Wikipedia-based Image Text (WIT) Dataset is a large multimodal multilingual dataset. WIT is composed of a curated set of 37.6 million entity rich image-text examples with 11.5 million unique images across 108 Wikipedia languages. Its size enables WIT to be used as a pretraining dataset for multimodal machine learning models.

79 papers2 benchmarksImages, Texts

MPIIGaze

MPIIGaze is a dataset for appearance-based gaze estimation in the wild. It contains 213,659 images collected from 15 participants during natural everyday laptop use over more than three months. It has a large variability in appearance and illumination.

78 papers1 benchmarksImages

BraTS 2017

The BRATS2017 dataset. It contains 285 brain tumor MRI scans, with four MRI modalities as T1, T1ce, T2, and Flair for each scan. The dataset also provides full masks for brain tumors, with labels for ED, ET, NET/NCR. The segmentation evaluation is based on three tasks: WT, TC and ET segmentation.

77 papers0 benchmarksImages, MRI, Medical

TuSimple

The TuSimple dataset consists of 6,408 road images on US highways. The resolution of image is 1280×720. The dataset is composed of 3,626 for training, 358 for validation, and 2,782 for testing called the TuSimple test set of which the images are under different weather conditions.

76 papers4 benchmarksImages

PETA (Pedestrian Attribute)

The PEdesTrian Attribute dataset (PETA) is a dataset fore recognizing pedestrian attributes, such as gender and clothing style, at a far distance. It is of interest in video surveillance scenarios where face and body close-shots and hardly available. It consists of 19,000 pedestrian images with 65 attributes (61 binary and 4 multi-class). Those images contain 8705 persons.

74 papers2 benchmarksImages

ApolloScape

ApolloScape is a large dataset consisting of over 140,000 video frames (73 street scene videos) from various locations in China under varying weather conditions. Pixel-wise semantic annotation of the recorded data is provided in 2D, with point-wise semantic annotation in 3D for 28 classes. In addition, the dataset contains lane marking annotations in 2D.

74 papers13 benchmarksImages, Videos

DDAD (Dense Depth for Autonomous Driving)

DDAD is a new autonomous driving benchmark from TRI (Toyota Research Institute) for long range (up to 250m) and dense depth estimation in challenging and diverse urban conditions. It contains monocular videos and accurate ground-truth depth (across a full 360 degree field of view) generated from high-density LiDARs mounted on a fleet of self-driving cars operating in a cross-continental setting. DDAD contains scenes from urban settings in the United States (San Francisco, Bay Area, Cambridge, Detroit, Ann Arbor) and Japan (Tokyo, Odaiba).

73 papers10 benchmarksImages, Videos

VisDrone

VisDrone is a large-scale benchmark with carefully annotated ground-truth for various important computer vision tasks, to make vision meet drones. The VisDrone2019 dataset is collected by the AISKYEYE team at Lab of Machine Learning and Data Mining, Tianjin University, China. The benchmark dataset consists of 288 video clips formed by 261,908 frames and 10,209 static images, captured by various drone-mounted cameras, covering a wide range of aspects including location (taken from 14 different cities separated by thousands of kilometers in China), environment (urban and country), objects (pedestrian, vehicles, bicycles, etc.), and density (sparse and crowded scenes). Note that, the dataset was collected using various drone platforms (i.e., drones with different models), in different scenarios, and under various weather and lighting conditions. These frames are manually annotated with more than 2.6 million bounding boxes of targets of frequent interests, such as pedestrians, cars, bicycl

73 papers0 benchmarksImages

ROxford

misc @inproceedings{RITAC18, author = {Radenovi\'{c}, F. and Iscen, A. and Tolias, G. and Avrithis, Y. and Chum, O.}, title = {Revisiting Oxford and Paris: Large-Scale Image Retrieval Benchmarking}, booktitle = {CVPR}, year = {2018} }

73 papers0 benchmarksImages

IMDB-WIKI

To the best of our knowledge this is the largest publicly available dataset of face images with gender and age labels for training. We provide pretrained models for both age and gender prediction.

73 papers0 benchmarksImages

PKU-MMD

The PKU-MMD dataset is a large skeleton-based action detection dataset. It contains 1076 long untrimmed video sequences performed by 66 subjects in three camera views. 51 action categories are annotated, resulting almost 20,000 action instances and 5.4 million frames in total. Similar to NTU RGB+D, there are also two recommended evaluate protocols, i.e. cross-subject and cross-view.

72 papers44 benchmarksImages, Videos

Birdsnap

Birdsnap is a large bird dataset consisting of 49,829 images from 500 bird species with 47,386 images used for training and 2,443 images used for testing.

72 papers3 benchmarksImages

GuessWhat?!

GuessWhat?! is a large-scale dataset consisting of 150K human-played games with a total of 800K visual question-answer pairs on 66K images.

72 papers0 benchmarksImages, Texts

GRAB

GRAB is a dataset of full-body motions interacting and grasping 3D objects. It contains accurate finger and facial motions as well as the contact between the objects and body. It contains 5 male and 5 female participants and 4 different motion intents. The GRAB dataset also contains binary contact maps between the body and objects.

72 papers7 benchmarksImages

CARPK (car parking lot dataset)

The Car Parking Lot Dataset (CARPK) contains nearly 90,000 cars from 4 different parking lots collected by means of drone (PHANTOM 3 PROFESSIONAL). The images are collected with the drone-view at approximate 40 meters height. The image set is annotated by bounding box per car. All labeled bounding boxes have been well recorded with the top-left points and the bottom-right points. It is supporting object counting, object localizing, and further investigations with the annotation format in bounding boxes.

71 papers2 benchmarksImages

SPair-71k

SPair-71k contains 70,958 image pairs with diverse variations in viewpoint and scale. Compared to previous datasets, it is significantly larger in number and contains more accurate and richer annotations.

71 papers3 benchmarksImages

PreviousPage 18 of 164Next