Datasets

3,275 machine learning datasets

3,275 dataset results

T-LESS

T-LESS is a dataset for estimating the 6D pose, i.e. translation and rotation, of texture-less rigid objects. The dataset features thirty industry-relevant objects with no significant texture and no discriminative color or reflectance properties. The objects exhibit symmetries and mutual similarities in shape and/or size. Compared to other datasets, a unique property is that some of the objects are parts of others. The dataset includes training and test images that were captured with three synchronized sensors, specifically a structured-light and a time-of-flight RGB-D sensor and a high-resolution RGB camera. There are approximately 39K training and 10K test images from each sensor. Additionally, two types of 3D models are provided for each object, i.e. a manually created CAD model and a semi-automatically reconstructed one. Training images depict individual objects against a black background. Test images originate from twenty test scenes having varying complexity, which increases from

94 papers6 benchmarks3D, Images, RGB-D

VGG Face

The VGG Face dataset is face identity recognition dataset that consists of 2,622 identities. It contains over 2.6 million images.

94 papers0 benchmarksImages

NC4K

As far as we know, there only exists one large camouflaged object testing dataset, the COD10K, while the sizes of other testing datasets are less than 300. We then contribute another camouflaged object testing dataset, namely NC4K, which includes 4,121 images downloaded from the Internet. The new testing dataset can be used to evaluate the generalization ability of existing models.

94 papers21 benchmarksImages

JAFFE (Japanese Female Facial Expression)

The JAFFE dataset consists of 213 images of different facial expressions from 10 different Japanese female subjects. Each subject was asked to do 7 facial expressions (6 basic facial expressions and neutral) and the images were annotated with average semantic ratings on each facial expression by 60 annotators.

93 papers9 benchmarksImages

SUN360 (Scene UNderstanding 360° panorama)

The goal of the SUN360 panorama database is to provide academic researchers in computer vision, computer graphics and computational photography, cognition and neuroscience, human perception, machine learning and data mining, with a comprehensive collection of annotated panoramas covering 360x180-degree full view for a large variety of environmental scenes, places and the objects within. To build the core of the dataset, the authors download a huge number of high-resolution panorama images from the Internet, and group them into different place categories. Then, they designed a WebGL annotation tool for annotating the polygons and cuboids for objects in the scene.

93 papers3 benchmarksImages

CUHK-PEDES

The CUHK-PEDES dataset is a caption-annotated pedestrian dataset. It contains 40,206 images over 13,003 persons. Images are collected from five existing person re-identification datasets, CUHK03, Market-1501, SSM, VIPER, and CUHK01 while each image is annotated with 2 text descriptions by crowd-sourcing workers. Sentences incorporate rich details about person appearances, actions, poses.

93 papers15 benchmarksImages, Texts

Aachen Day-Night

Aachen Day-Night is a dataset designed for benchmarking 6DOF outdoor visual localization in changing conditions. It focuses on localizing high-quality night-time images against a day-time 3D model. There are 14,607 images with changing conditions of weather, season and day-night cycles.

93 papers0 benchmarks3D, Images

xView

xView is one of the largest publicly available datasets of overhead imagery. It contains images from complex scenes around the world, annotated using bounding boxes. It contains over 1M object instances from 60 different classes.

93 papers5 benchmarksImages

VOC 2012 (The PASCAL Visual Object Classes Challenge 2012)

see detailed use case on code implementation of the paper 'Tell Me Where To Look: Guided Attention Inference Networks'

93 papers0 benchmarksImages

UCSD Ped2 (UCSD Anomaly Detection Dataset)

The UCSD Anomaly Detection Dataset was acquired with a stationary camera mounted at an elevation, overlooking pedestrian walkways. The crowd density in the walkways was variable, ranging from sparse to very crowded. In the normal setting, the video contains only pedestrians. Abnormal events are due to either: the circulation of non pedestrian entities in the walkways anomalous pedestrian motion patterns Commonly occurring anomalies include bikers, skaters, small carts, and people walking across a walkway or in the grass that surrounds it. A few instances of people in wheelchair were also recorded. All abnormalities are naturally occurring, i.e. they were not staged for the purposes of assembling the dataset. The data was split into 2 subsets, each corresponding to a different scene. The video footage recorded from each scene was split into various clips of around 200 frames.

92 papers5 benchmarksImages, Videos

Sim10k

SIM10k is a synthetic dataset containing 10,000 images, which is rendered from the video game Grand Theft Auto V (GTA5).

92 papers1 benchmarksImages

VITON (VITON-Zalando Dataset)

VITON was a dataset for virtual try-on of clothing items. It consisted of 16,253 pairs of images of a person and a clothing item .

92 papers12 benchmarksImages

MIT-States

The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.

91 papers13 benchmarksImages

FaceWarehouse

FaceWarehouse is a 3D facial expression database that provides the facial geometry of 150 subjects, covering a wide range of ages and ethnic backgrounds.

91 papers1 benchmarks3D, Images

DanceTrack

A large-scale multi-object tracking dataset for human tracking in occlusion, frequent crossover, uniform appearance and diverse body gestures. It is proposed to emphasize the importance of motion analysis in multi-object tracking instead of mainly appearance-matching-based diagram.

91 papers10 benchmarksImages, Videos

Oxford-IIIT Pet Dataset

The Oxford-IIIT Pet Dataset has 37 categories with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation.

90 papers16 benchmarksImages

ST-VQA (Scene Text Visual Question Answering)

ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.

90 papers0 benchmarksImages, Texts

Winoground

Winoground is a dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning. Given two images and two captions, the goal is to match them correctly -- but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance.

90 papers3 benchmarksImages, Texts

MSU-MFSD

The MSU-MFSD dataset contains 280 video recordings of genuine and attack faces. 35 individuals have participated in the development of this database with a total of 280 videos. Two kinds of cameras with different resolutions (720×480 and 640×480) were used to record the videos from the 35 individuals. For the real accesses, each individual has two video recordings captured with the Laptop cameras and Android, respectively. For the video attacks, two types of cameras, the iPhone and Canon cameras were used to capture high definition videos on each of the subject. The videos taken with Canon camera were then replayed on iPad Air screen to generate the HD replay attacks while the videos recorded by the iPhone mobile were replayed itself to generate the mobile replay attacks. Photo attacks were produced by printing the 35 subjects’ photos on A3 papers using HP colour printer. The recording videos with respect to the 35 individuals were divided into training (15 subjects with 120 videos) an

89 papers16 benchmarksImages, Videos

COCO-Text

The COCO-Text dataset is a dataset for text detection and recognition. It is based on the MS COCO dataset, which contains images of complex everyday scenes. The COCO-Text dataset contains non-text images, legible text images and illegible text images. In total there are 22184 training images and 7026 validation images with at least one instance of legible text.

89 papers6 benchmarksImages, Texts

PreviousPage 16 of 164Next