TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

GMEG-yahoo

Grammatical error correction dataset for text from Yahoo! Answers

2 papers0 benchmarks

VGaokao

VGaokao is a verification style reading comprehension dataset designed for native speakers' evaluation.

2 papers0 benchmarksTexts

Roof-Image Dataset

We created a building-image paired dataset that contains more than 3K samples using our roof modeling tools.

2 papers0 benchmarks

Berkeley MHAD (Berkeley Multimodal Human Action Database)

Description The Berkeley Multimodal Human Action Database (MHAD) contains 11 actions performed by 7 male and 5 female subjects in the range 23-30 years of age except for one elderly subject. All the subjects performed 5 repetitions of each action, yielding about 660 action sequences which correspond to about 82 minutes of total recording time. In addition, we have recorded a T-pose for each subject which can be used for the skeleton extraction; and the background data (with and without the chair used in some of the activities). The specified set of actions comprises of the following: (1) actions with movement in both upper and lower extremities, e.g., jumping in place, jumping jacks, throwing, etc., (2) actions with high dynamics in upper extremities, e.g., waving hands, clapping hands, etc. and (3) actions with high dynamics in lower extremities, e.g., sit down, stand up. Prior to each recording, the subjects were given instructions on what action to perform; however no specific deta

2 papers0 benchmarks

EmoCause

EmoCause is a dataset of annotated emotion cause words in emotional situations from the EmpatheticDialogues valid and test set. The goal is to recognize emotion cause words in sentences by training only on sentence-level emotion labels without word-level labels (i.e., weakly-supervised emotion cause recognition).

2 papers3 benchmarksTexts

ARCA23K

ARCA23K is a dataset of labelled sound events created to investigate real-world label noise. It contains 23,727 audio clips originating from Freesound, and each clip belongs to one of 70 classes taken from the AudioSet ontology. The dataset was created using an entirely automated process with no manual verification of the data. For this reason, many clips are expected to be labelled incorrectly.

2 papers0 benchmarksAudio

Saint Gall

Saint Gall dataset contains handwritten historical manuscripts written in Latin that date back to the 9th century. It consists of 60 pages, 1 410 text lines and 11 597 words.

2 papers2 benchmarksImages, Texts

EFO-1-QA

EFO-1-QA is a new dataset to benchmark the combinatorial generalizability of Complex Query Answering (CQA) models by including 301 different queries types, which is 20 times larger than existing datasets.

2 papers0 benchmarksTexts

BiRdQA

BiRdQA is a bilingual multiple-choice question answering dataset with 6614 English riddles and 8751 Chinese riddles.

2 papers0 benchmarksTexts

GermEval

The GermEval dataset is a valuable resource for natural language processing (NLP) tasks, specifically named entity recognition (NER), conducted in the German language. Here are some key details about this dataset:

2 papers0 benchmarksTexts

ERATO

ERATO is a large-scale multi-modal dataset for Pairwise Emotional Relationship Recognition (PERR). It has 31,182 video clips, lasting about 203 video hours. Different from the existing datasets, ERATO contains interaction-centric videos with multi-shots, varied video length, and multiple modalities including visual, audio and text

2 papers0 benchmarksVideos

AStitchInLanguageModels

AStitchInLanguageModels is a dataset for the exploration of idiomaticity in pre-trained language models.

2 papers0 benchmarks

MAVS (Multilingual Audio-Visual Smartphone dataset)

MAVS is an audio-visual smartphone dataset captured in five different recent smartphones. This new dataset contains 103 subjects captured in three different sessions considering the different real-world scenarios. Three different languages are acquired in this dataset to include the problem of language dependency of the speaker recognition systems.

2 papers0 benchmarksSpeech, Videos

MuViHand

MuViHand is a dataset for 3D Hand Pose Estimation that consists of multi-view videos of the hand along with ground-truth 3D pose labels. The dataset includes more than 402,000 synthetic hand images available in 4,560 videos. The videos have been simultaneously captured from six different angles with complex backgrounds and random levels of dynamic lighting. The data has been captured from 10 distinct animated subjects using 12 cameras in a semi-circle topology.

2 papers0 benchmarksImages

Paint4Poem

Paint4Poem consists of 301 high-quality poem-painting pairs collected manually from an influential modern Chinese artist Feng Zikai.

2 papers0 benchmarks

TCP-CI (Test Case Prioritization in CI Contexts)

This dataset is a benchmark of 25 open-source subjects with 21.5k builds and 3.6k failed builds that enables a fair comparison and evaluation of Test Case Prioritization (TCP) techniques. We made our data collection tools available, which can be used to extend and update the subjects. The description of the structure and files of the dataset can be also found in the documentation of the data collection tool.

2 papers0 benchmarks

VVAD-LRS3

A dataset for Visual Voice Activity Detection extracted from the LRS3 dataset.

2 papers0 benchmarksImages

OpenViDial 2.0

OpenViDial 2.0 is a larger-scale open-domain multi-modal dialogue dataset compared to the previous version OpenViDial 1.0. OpenViDial 2.0 contains a total number of 5.6 million dialogue turns extracted from either movies or TV series from different resources, and each dialogue turn is paired with its corresponding visual context.

2 papers20 benchmarksImages, Texts

EDGAR-CORPUS

EDGAR-CORPUS is a novel corpus comprising annual reports from all the publicly traded companies in the US spanning a period of more than 25 years. All the reports are downloaded, split into their corresponding items (sections), and provided in a clean, easy-to-use JSON format.

2 papers0 benchmarksTexts

OV

Description OV dataset is the camera calibration dataset. There are 16 lenses ranging from 90° to 180° FOV:

2 papers0 benchmarks
PreviousPage 317 of 1000Next