TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

VisA (Visual Anomaly Dataset)

The VisA dataset contains 12 subsets corresponding to 12 different objects as shown in the above figure. There are 10,821 images with 9,621 normal and 1,200 anomalous samples. Four subsets are different types of printed circuit boards (PCB) with relatively complex structures containing transistors, capacitors, chips, etc. For the case of multiple instances in a view, we collect four subsets: Capsules, Candles, Macaroni1 and Macaroni2. Instances in Capsules and Macaroni2 largely differ in locations and poses. Moreover, we collect four subsets including Cashew, Chewing gum, Fryum and Pipe fryum, where objects are roughly aligned. The anomalous images contain various flaws, including surface defects such as scratches, dents, color spots or crack, and structural defects like misplacement or missing parts.

86 papers7 benchmarks

SPAQ (Smartphone Photography Attribute and Quality)

The Smartphone Photography Attribute and Quality (SPAQ) dataset is a comprehensive database for the perceptual quality assessment of smartphone photography. It was introduced in a paper titled "Perceptual Quality Assessment of Smartphone Photography" presented at the IEEE Conference on Computer Vision and Pattern Recognition in 2020.

86 papers4 benchmarks

CULane

CULane is a large scale challenging dataset for academic research on traffic lane detection. It is collected by cameras mounted on six different vehicles driven by different drivers in Beijing. More than 55 hours of videos were collected and 133,235 frames were extracted. The dataset is divided into 88880 images for training set, 9675 for validation set, and 34680 for test set. The test set is divided into normal and 8 challenging categories.

85 papers4 benchmarksImages, Videos

Orkut

Orkut is a social network dataset consisting of friendship social network and ground-truth communities from Orkut.com on-line social network where users form friendship each other.

85 papers0 benchmarksGraphs

BigEarthNet

BigEarthNet consists of 590,326 Sentinel-2 image patches, each of which is a section of i) 120x120 pixels for 10m bands; ii) 60x60 pixels for 20m bands; and iii) 20x20 pixels for 60m bands.

85 papers8 benchmarksHyperspectral images, Images

Structured3D

Structured3D is a large-scale photo-realistic dataset containing 3.5K house designs (a) created by professional designers with a variety of ground truth 3D structure annotations (b) and generate photo-realistic 2D images (c). The dataset consists of rendering images and corresponding ground truth annotations (e.g., semantic, albedo, depth, surface normal, layout) under different lighting and furniture configurations.

85 papers4 benchmarks3D, Images

CAMUS (Cardiac Acquisitions for Multi-structure Ultrasound Segmentation)

This project aims to provide all the materials to the community to resolve the problem of echocardiographic image segmentation and volume estimation from 2D ultrasound sequences (both two and four-chamber views). To this aim, the following solutions were set up.

85 papers0 benchmarksMedical, Videos

ToxiGen

A large-scale and machine-generated dataset of 274,186 toxic and benign statements about 13 minority groups.

85 papers0 benchmarksTexts

WikiBio (Wikipedia Biography Dataset)

This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).

84 papers6 benchmarksTexts

How2

The How2 dataset contains 13,500 videos, or 300 hours of speech, and is split into 185,187 training, 2022 development (dev), and 2361 test utterances. It has subtitles in English and crowdsourced Portuguese translations.

84 papers3 benchmarksAudio, Texts, Videos

PROMISE12

The PROMISE12 dataset was made available for the MICCAI 2012 prostate segmentation challenge. Magnetic Resonance (MR) images (T2-weighted) of 50 patients with various diseases were acquired at different locations with several MRI vendors and scanning protocols.

84 papers1 benchmarksImages, MRI, Medical

TweetEval

TweetEval introduces an evaluation framework consisting of seven heterogeneous Twitter-specific classification tasks.

84 papers8 benchmarksTexts

MiniF2F

MiniF2F is a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses.

84 papers0 benchmarksTexts

CodeContests

CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.

84 papers4 benchmarksTexts

VITON-HD (High-Resolution VITON-Zalando Dataset)

VITON-HD dataset is a dataset for high-resolution (i.e., 1024x768) virtual try-on of clothing items. Specifically, it consists of 13,679 frontal-view woman and top clothing image pairs.

84 papers5 benchmarksImages

MusicCaps

MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second music clip, MusicCaps provides:

84 papers7 benchmarksMusic, Texts

NLVR (Natural Language Visual Reasoningnatural language for visual reasoning)

NLVR contains 92,244 pairs of human-written English sentences grounded in synthetic images. Because the images are synthetically generated, this dataset can be used for semantic parsing.

83 papers3 benchmarksImages, Texts

GovReport

GovReport is a dataset for long document summarization, with significantly longer documents and summaries. It consists of reports written by government research agencies including Congressional Research Service and U.S. Government Accountability Office.

83 papers9 benchmarksTexts

11k Hands

A large dataset of human hand images (dorsal and palmar sides) with detailed ground-truth information for gender recognition and biometric identification.

82 papers0 benchmarks

Referring Expressions for DAVIS 2016 & 2017

Our task is to localize and provide a pixel-level mask of an object on all video frames given a language referring expression obtained either by looking at the first frame only or the full video. To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. These two datasets introduce various challenges, containing videos with single or multiple salient objects, crowded scenes, similar looking instances, occlusions, camera view changes, fast motion, etc.

82 papers6 benchmarksTexts
PreviousPage 39 of 1000Next