TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

3,275 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2
Clear filter

3,275 dataset results

PubFig (Public Figures Face Database)

The PubFig database is a large, real-world face dataset consisting of 58,797 images of 200 people collected from the internet. Unlike most other existing face datasets, these images are taken in completely uncontrolled situations with non-cooperative subjects. Thus, there is large variation in pose, lighting, expression, scene, camera, imaging conditions and parameters, etc. The PubFig dataset is similar in spirit to the Labeled Faces in the Wild (LFW) dataset.

2 papers0 benchmarksImages

Rendered Handpose Dataset

Rendered Handpose Dataset contains 41258 training and 2728 testing samples. Each sample provides:

2 papers0 benchmarks3D, Images, RGB-D

RISE

RISE is a large-scale video dataset for Recognizing Industrial Smoke Emissions. A citizen science approach was adopted to collaborate with local community members to annotate whether a video clip has smoke emissions. The dataset contains 12,567 clips from 19 distinct views from cameras that monitored three industrial facilities. These daytime clips span 30 days over two years, including all four seasons.

2 papers0 benchmarksImages

StereoMSI

StereoMSI comprises of 350 registered colour-spectral image pairs. The dataset has been used for the two tracks of the PIRM2018 challenge.

2 papers0 benchmarksImages

SWAX (Sense Wax Attack dataset)

Comprised of real human and wax figure images and videos that endorse the problem of face spoofing detection. The dataset consists of more than 1800 face images and 110 videos of 55 people/waxworks, arranged in training, validation and test sets with a large range in expression, illumination and pose variations.

2 papers0 benchmarksImages

Talk2Nav

Talk2Nav is a large-scale dataset with verbal navigation instructions.

2 papers0 benchmarksImages, Texts

TE141K

A new text effects dataset with 141,081 text effect/glyph pairs in total. The dataset consists of 152 professionally designed text effects rendered on glyphs, including English letters, Chinese characters, and Arabic numerals.

2 papers0 benchmarksImages

UCLA Protest Image

40,764 images (11,659 protest images and hard negatives) with various annotations of visual attributes and sentiments.

2 papers0 benchmarksImages

UFPR-AMR

This dataset contains 2,000 images taken from inside a warehouse of the Energy Company of Paraná (Copel), which directly serves more than 4 million consuming units in the Brazilian state of Paraná.

2 papers1 benchmarksImages

Vistas-NP

The Vistas-NP dataset is an out-of-distribution detection dataset based on the Mapillary Vistas dataset. The original Vistas dataset consists of 18,000 training images and 2,000 validation images with 66 classes. In Vistas-NP the human classes are used as outliers due to their dispersion across scenes and visual diversity from other objects. The dataset is created by excluding all images with class person and the three rider classes to the test subset. Consequently, the dataset has 8,003 train images and 830 validation images. The test set contains 11,167.

2 papers0 benchmarksImages

VizWiz-QualityIssues

A large-scale dataset that links the assessment of image quality issues to two practical vision tasks: image captioning and visual question answering.

2 papers0 benchmarksImages

Visual Beliefs

Visual Beliefs is a dataset of abstract scenes to study visual beliefs. The dataset consists of 8-frame scenes, and in each scene a person has a mistaken belief. The dataset can be used for two tasks: predicting who is mistaken and predicting when are they mistaken.

2 papers0 benchmarksImages

RSOC (Remote Sensing Object Counting)

RSOC is a large-scale object counting dataset with remote sensing images, which contains four important geographic objects: buildings, crowded ships in harbors, large-vehicles and small-vehicles in parking lots.

2 papers0 benchmarksImages

FSVQA (Full-Sentence Visual Question Answering)

Full-Sentence Visual Question Answering (FSVQA) dataset, consisting of nearly 1 million pairs of questions and full-sentence answers for images, built by applying a number of rule-based natural language processing techniques to original VQA dataset and captions in the MS COCO dataset.

2 papers0 benchmarksImages, Texts

VQA 360°

VQA 360° is a dataset for visual question answering on 360° images containing around 17,000 real-world image-question-answer triplets for a variety of question types.

2 papers0 benchmarksImages, Texts

PHSPD (Polarization Human Shape and Pose Dataset)

PHSPD is a home-grown polarization image dataset of various human shapes and poses.

2 papers0 benchmarksImages

Multi Task Crowd

Multi Task Crowd is a new 100 image dataset fully annotated for crowd counting, violent behaviour detection and density level classification.

2 papers0 benchmarksImages

C&Z

One of the first datasets (if not the first) to highlight the importance of bias and diversity in the community, which started a revolution afterwards. Introduced in 2014 as integral part of a thesis of Master of Science [1,2] at Carnegie Mellon and City University of Hong Kong. It was later expanded by adding synthetic images generated by a GAN architecture at ETH Zürich (in HDCGAN by Curtó et al. 2017). Being then not only the pioneer of talking about the importance of balanced datasets for learning and vision but also for being the first GAN augmented dataset of faces.

2 papers0 benchmarksImages

TRN (Toulouse Road Network)

The Toulouse Road Network dataset describes patches of road maps from the city of Toulouse, represented both as spatial graphs G = (A, X) and as grayscale segmentation images.

2 papers0 benchmarksGraphs, Images

FRGC-Morphs

FRGC-Morphs is a dataset of morphed faces selected from the publicly available FRGC dataset [1].

2 papers0 benchmarksImages
PreviousPage 94 of 164Next