TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

PRImA

The Prima head pose dataset consists of 2790 images of 15 persons recorded twice. Pitch values lie in the interval [−60∘,60∘], and yaw values lie in the interval [−90∘,90∘] with a 15∘ step. Thus, there are 93 poses available for each person. All the recordings were achieved with the same background. One interesting feature of this dataset is the pose space is uniformly sampled. The dataset is annotated such that a face bounding box (manually annotated) and the corresponding yaw and pitch angle values are provided for each sample.

1 papers1 benchmarksImages

KTH Multiview Football II

KTI Multiview Football II consists of images of professional footballers during a match of the Allsvenskan league. It consists of two parts: one with ground truth pose in 2D and one with ground truth pose in both 2D and 3D. The 3D dataset has 800 time frames, captured from 3 views (2400 images). Views are calibrated and synchronized. 3D ground truth pose and orthographic camera matrices are provided for each frame. There are 14 annotated joints. Lastly, there are 2 different players and two sequences per player.

1 papers0 benchmarksImages

Oxford Town Center

The Oxford Town Center dataset is a 5-minute video with 7500 frames annotated, which is divided into 6500 for training and 1000 for testing data for pedestrian detection. The data was recorded from a CCTV camera in Oxford for research and development into activity and face recognition.

1 papers1 benchmarksVideos

RMRC 2014

The RMRC 2014 indoor dataset is a dataset for indoor semantic segmentation. It employs the NYU Depth V2 and Sun3D datasets to define the training set. The test data consists of newly acquired images.

1 papers0 benchmarksImages, RGB-D

SINS

SINS is a database of continuous real-life audio recordings in a home environment. The home is a vacation home and one person lived there during the recording period of over on week. It was collected using a network of 13 microphone arrays distributed over the multiple rooms. Each microphone array consisted of 4 linearly arranged microphones. Recordings were annotated based on the level of daily activities performed in the environment.

1 papers1 benchmarksAudio

Freesound One-Shot Percussive Sounds

The Freesound One-Shot Percussive Sounds dataset contains 10254 one-shot (single event) percussive sounds from Freesound.org and the corresponding timbral analysis. These were used to train the generative model for "Neural Percussive Synthesis Parameterised by High-Level Timbral Features".

1 papers0 benchmarksAudio

VocalImitationSet

The VocalImitationSet is a collection of crowd-sourced vocal imitations of a large set of diverse sounds collected from Freesound (https://freesound.org/), which were curated based on Google's AudioSet ontology (https://research.google.com/audioset/).

1 papers0 benchmarksAudio

NIH-LN (NIH-Lymph Node)

NIH-Lymph Node (NIH-LN) contains 388 mediastinal LNs in 90 CT scans and 595 abdominal LNs in 86 scans.

1 papers0 benchmarksImages, Medical

gtzan_music_speech

gtzan_music_speech is a dataset for music/speech discrimination. It consists of 120 tracks of 30 second length. Each class (music/speech) has 60 samples. The tracks are all 22050Hz Mono 16-bit audio files in .wav format.

1 papers0 benchmarksAudio

Refer360°

Refer360° is a novel large-scale referring expression recognition dataset consisting of 17,137 instruction sequences and ground-truth actions for completing these instructions in 360° scenes.

1 papers0 benchmarksTexts

OQGend

Dataset OQRanD and OQGenD for paper "Asking the crowd: Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums" by Zi Chai, Xinyu Xing, Xiaojun Wan and Bo Huang. This paper is accepted by ACL'19.

1 papers0 benchmarks

OQRanD

Dataset OQRanD and OQGenD for paper "Asking the crowd: Asking the Crowd: Question Analysis, Evaluation and Generation for Open Discussion on Online Forums" by Zi Chai, Xinyu Xing, Xiaojun Wan and Bo Huang. This paper is accepted by ACL'19.

1 papers0 benchmarksTexts

ScienceExamCER

ScienceExamCER is a collection of resources for studying explanation-centered inference, including explanation graphs for 1,680 questions, with 4,950 tablestore rows, and other analyses of the knowledge required to answer elementary and middle-school science questions.

1 papers0 benchmarksTexts

TupleInf Open IE Dataset

The TupleInf Open IE dataset contains Open IE tuples extracted from 263K sentences that were used by the solver in “Answering Complex Questions Using Open Information Extraction” (referred as Tuple KB, T). These sentences were collected from a large Web corpus using training questions from 4th and 8th grade as queries. This dataset contains 156K sentences collected for 4th grade questions and 107K sentences for 8th grade questions. Each sentence is followed by the Open IE v4 tuples using their simple format.

1 papers0 benchmarksTexts

Slim

This dataset consists of virtual scenes rendered in MuJoCo with multiple views each presented in multiple modalities: image, and synthetic or natural language descriptions. Each scene consists of two or three objects placed on a square walled room, and for each of the 10 camera viewpoint the authors rendered a 3D view of the scene as seen from that viewpoint as well as a synthetically generated description of the scene.

1 papers0 benchmarksImages, Texts

Image网

Image网 (pronounced Imagewang; 网 means "net" in Chinese) is an image classification dataset combined from Imagenette and Imagewoof datasets in a way to make it into a semi-supervised unbalanced classification problem:

1 papers0 benchmarks

AutoWeakS

Collects all the courses from XuetangX5, one of the largest MOOCs in China, and this results in 1951 courses. The collected courses involve seven areas: computer science, economics, engineering, foreign language, math, physics, and social science. Each course contains 131 words in its descriptions on average. Contains 706 job postings from the recruiting website operated by JD.com (JD) and 2,456 job postings from the website owned by Tencent corporation (Tencent). The collected job postings involve six areas: technical post, financial post, product post, design post, market post, supply chain and engineering post.

1 papers0 benchmarksTexts

MobiFace

MobiFace is the first dataset for single face tracking in mobile situations. It consists of 80 unedited live-streaming mobile videos captured by 70 different smartphone users in fully unconstrained environments. Over 95K bounding boxes are manually labelled. The videos are carefully selected to cover typical smartphone usage. The videos are also annotated with 14 attributes, including 6 newly proposed attributes and 8 commonly seen in object tracking.

1 papers0 benchmarksVideos

Relational Pattern Similarity Dataset

The relational pattern similarity dataset is a new dataset upon the work of Zeichner et al. (2012), which consists of relational patterns with semantic inference labels annotated. The dataset includes 5,555 pairs extracted by Reverb (Fader et al., 2011), 2,447 pairs with inference relation and 3,108 pairs (the rest) without one.

1 papers0 benchmarksTexts

ARIA (Automated Retinal Image Analysis (ARIA) Data Set)

This data set was collected in 2004 to 2006 in the United Kingdom. Subjects were adult males and females, some of whom were healthy (control group), some with age-related macular degeneration (AMD group), and some were diabetic patients (diabetic group). Unfortunately, no other information from this time exists about this subjects.

1 papers0 benchmarksImages
PreviousPage 363 of 1000Next