Datasets

3,275 machine learning datasets

3,275 dataset results

300W (300 Faces-In-The-Wild)

The 300-W is a face dataset that consists of 300 Indoor and 300 Outdoor in-the-wild images. It covers a large variation of identity, expression, illumination conditions, pose, occlusion and face size. The images were downloaded from google.com by making queries such as “party”, “conference”, “protests”, “football” and “celebrities”. Compared to the rest of in-the-wild datasets, the 300-W database contains a larger percentage of partially-occluded images and covers more expressions than the common “neutral” or “smile”, such as “surprise” or “scream”. Images were annotated with the 68-point mark-up using a semi-automatic methodology. The images of the database were carefully selected so that they represent a characteristic sample of challenging but natural face instances under totally unconstrained conditions. Thus, methods that achieve accurate performance on the 300-W database can demonstrate the same accuracy in most realistic cases. Many images of the database contain more than one a

206 papers46 benchmarksImages

CIFAR-FS (CIFAR100 few-shots)

CIFAR100 few-shots (CIFAR-FS) is randomly sampled from CIFAR-100 (Krizhevsky & Hinton, 2009) by using the same criteria with which miniImageNet has been generated. The average inter-class similarity is sufficiently high to represent a challenge for the current state of the art. Moreover, the limited original resolution of 32×32 makes the task harder and at the same time allows fast prototyping.

206 papers0 benchmarksImages

LAION-5B

LAION 5B is a large-scale dataset for research purposes consisting of 5,85B CLIP-filtered image-text pairs. 2,3B contain English language, 2,2B samples from 100+ other languages and 1B samples have texts that do not allow a certain language assignment (e.g. names ). Additionally, we provide several nearest neighbor indices, an improved web interface for exploration & subset creation as well as detection scores for watermark and NSFW.

205 papers0 benchmarksImages, Texts

COCO Captions

COCO Captions contains over one and a half million captions describing over 330,000 images. For the training and validation images, five independent human generated captions are be provided for each image.

203 papers13 benchmarksImages, Texts

YouTube-VOS 2018 (Youtube Video Object Segmentation)

Youtube-VOS is a Video Object Segmentation dataset that contains 4,453 videos - 3,471 for training, 474 for validation, and 508 for testing. The training and validation videos have pixel-level ground truth annotations for every 5th frame (6 fps). It also contains Instance Segmentation annotations. It has more than 7,800 unique objects, 190k high-quality manual annotations and more than 340 minutes in duration.

203 papers44 benchmarksImages, Videos

LSP (Leeds Sports Pose)

The Leeds Sports Pose (LSP) dataset is widely used as the benchmark for human pose estimation. The original LSP dataset contains 2,000 images of sportspersons gathered from Flickr, 1000 for training and 1000 for testing. Each image is annotated with 14 joint locations, where left and right joints are consistently labelled from a person-centric viewpoint. The extended LSP dataset contains additional 10,000 images labeled for training.

202 papers0 benchmarksImages

Helen

The HELEN dataset is composed of 2330 face images of 400×400 pixels with labeled facial components generated through manually-annotated contours along eyes, eyebrows, nose, lips and jawline.

201 papers2 benchmarksImages

PASCAL VOC (PASCAL Visual Object Classes Challenge)

The PASCAL Visual Object Classes (VOC) 2012 dataset contains 20 object categories including vehicles, household, animals, and other: aeroplane, bicycle, boat, bus, car, motorbike, train, bottle, chair, dining table, potted plant, sofa, TV/monitor, bird, cat, cow, dog, horse, sheep, and person. Each image in this dataset has pixel-level segmentation annotations, bounding box annotations, and object class annotations. This dataset has been widely used as a benchmark for object detection, semantic segmentation, and classification tasks. The PASCAL VOC dataset is split into three subsets: 1,464 images for training, 1,449 images for validation and a private testing set.

198 papers21 benchmarksImages

MPI Sintel

MPI (Max Planck Institute) Sintel is a dataset for optical flow evaluation that has 1064 synthesized stereo images and ground truth data for disparity. Sintel is derived from open-source 3D animated short film Sintel. The dataset has 23 different scenes. The stereo images are RGB while the disparity is grayscale. Both have resolution of 1024×436 pixels and 8-bit per channel.

198 papers8 benchmarksImages, Stereo

CINIC-10

CINIC-10 is a dataset for image classification. It has a total of 270,000 images, 4.5 times that of CIFAR-10. It is constructed from two different sources: ImageNet and CIFAR-10. Specifically, it was compiled as a bridge between CIFAR-10 and ImageNet. It is split into three equal subsets - train, validation, and test - each of which contain 90,000 images.

197 papers10 benchmarksImages

Moving MNIST

The Moving MNIST dataset contains 10,000 video sequences, each consisting of 20 frames. In each video sequence, two digits move independently around the frame, which has a spatial resolution of 64×64 pixels. The digits frequently intersect with each other and bounce off the edges of the frame

194 papers10 benchmarksImages, Videos

MNIST-M

MNIST-M is created by combining MNIST digits with the patches randomly extracted from color photos of BSDS500 as their background. It contains 59,001 training and 90,001 test images.

193 papers0 benchmarksImages

MOTChallenge

The MOTChallenge datasets are designed for the task of multiple object tracking. There are several variants of the dataset released each year, such as MOT15, MOT17, MOT20.

192 papers0 benchmarksImages, Videos

CMU-MOSEI

CMU Multimodal Opinion Sentiment and Emotion Intensity (CMU-MOSEI) is the largest dataset of sentence-level sentiment analysis and emotion recognition in online videos. CMU-MOSEI contains over 12 hours of annotated video from over 1000 speakers and 250 topics.

190 papers13 benchmarksAudio, Images, Texts, Videos

RESISC45

RESISC45 dataset is a dataset for Remote Sensing Image Scene Classification (RESISC). It contains 31,500 RGB images of size 256×256 divided into 45 scene classes, each class containing 700 images. Among its notable features, RESISC45 contains varying spatial resolution ranging from 20cm to more than 30m/px.

187 papers5 benchmarksImages

PreviousPage 8 of 164Next

Datasets

300W (300 Faces-In-The-Wild)

CIFAR-FS (CIFAR100 few-shots)

LAION-5B

COCO Captions

YouTube-VOS 2018 (Youtube Video Object Segmentation)

LSP (Leeds Sports Pose)

Helen

PASCAL VOC (PASCAL Visual Object Classes Challenge)

MPI Sintel

CINIC-10

Moving MNIST

MNIST-M

MOTChallenge

CMU-MOSEI

RESISC45

Celeb-DF

SUNCG

Extended Yale B

SA-1B

OTB-2015

Datasets

300W (300 Faces-In-The-Wild)

CIFAR-FS (CIFAR100 few-shots)

LAION-5B

COCO Captions

YouTube-VOS 2018 (Youtube Video Object Segmentation)

LSP (Leeds Sports Pose)

Helen

PASCAL VOC (PASCAL Visual Object Classes Challenge)

MPI Sintel

CINIC-10

Moving MNIST

MNIST-M

MOTChallenge

CMU-MOSEI

RESISC45

Celeb-DF

SUNCG

Extended Yale B

SA-1B

OTB-2015