TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

TbD

6 papers33 benchmarks

Paris6k

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

6 papers1 benchmarks

TrackML challenge Accuracy phase dataset (Tracking Machine Learning Challenge)

The dataset comprises multiple independent events, where each event contains simulated measurements (essentially 3D points) of particles generated in a collision between proton bunches at the Large Hadron Collider at CERN. The goal of the tracking machine learning challenge is to group the recorded measurements or hit for each event into tracks, sets of hits that belong to the same initial particle. A solution must uniquely associate each hit to one track. The training dataset contains the recorded hit, their ground truth counterpart and their association to particles, and the initial parameters of those particles. The test dataset contains only the recorded hits.

6 papers0 benchmarks

PIE (Pedestrian Intention Estimation)

PIE is a new dataset for studying pedestrian behavior in traffic. PIE contains over 6 hours of footage recorded in typical traffic scenes with on-board camera. It also provides accurate vehicle information from OBD sensor (vehicle speed, heading direction and GPS coordinates) synchronized with video footage. Rich spatial and behavioral annotations are available for pedestrians and vehicles that potentially interact with the ego-vehicle as well as for the relevant elements of infrastructure (traffic lights, signs and zebra crossings). There are over 300K labeled video frames with 1842 pedestrian samples making this the largest publicly available dataset for studying pedestrian behavior in traffic.

6 papers5 benchmarks

Libri-adhoc40

Libri-adhoc40 is a synchronized speech corpus which collects the replayed Librispeech data from loudspeakers by ad-hoc microphone arrays of 40 strongly synchronized distributed nodes in a real office environment. Besides, to provide the evaluation target for speech frontend processing and other applications, the authors also recorded the replayed speech in an anechoic chamber.

6 papers0 benchmarksSpeech

DogWhistle

Cant (also known as doublespeak, cryptolect, argot, anti-language or secret language) is important for understanding advertising, comedies and dog-whistle politics. DogWhistle is a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective.

6 papers0 benchmarksTexts

L3DAS21

L3DAS21 is a dataset for 3D audio signal processing. It consists of a 65 hours 3D audio corpus, accompanied with a Python API that facilitates the data usage and results submission stage.

6 papers4 benchmarksAudio

AcinoSet

AcinoSet is a dataset of free-running cheetahs in the wild that contains 119,490 frames of multi-view synchronized high-speed video footage, camera calibration files and 7,588 human-annotated frames. The authors utilized markerless animal pose estimation with DeepLabCut to provide 2D keypoints (in the 119K frames). It also includes 3D trajectories, human-checked 3D ground truth, and an interactive tool to inspect the data.

6 papers0 benchmarksImages

IIIT-ILST

IIIT-ILST is a dataset and benchmark for scene text recognition for three Indic scripts - Devanagari, Telugu and Malayalam. IIIT-ILST contains nearly 1000 real images per each script which are annotated for scene text bounding boxes and transcriptions.

6 papers0 benchmarksImages, Texts

Ulm-TSST (Ulm-Trier Social Stress Dataset)

Ulm-TSST is a dataset continuous emotion (valence and arousal) prediction and `physiological-emotion' prediction. It consists of a multimodal richly annotated dataset of self-reported, and external dimensional ratings of emotion and mental well-being. After a brief period of preparation the subjects are asked to give an oral presentation, within a job-interview setting. Ulm-TSST includes biological recordings, such as Electrocardiogram (ECG), Electrodermal Activity (EDA), Respiration, and Heart Rate (BPM) as well as continuous arousal and valence annotations. With 105 participants (69.5% female) aged between 18 and 39 years, a total of 10 hours were accumulated.

6 papers0 benchmarks

OmniFlow

OmniFlow is a synthetic omnidirectional human optical flow dataset. Based on a rendering engine the authors created a naturalistic 3D indoor environment with textured rooms, characters, actions, objects, illumination and motion blur where all components of the environment are shuffled during the data capturing process. The simulation has as output rendered images of household activities and the corresponding forward and backward optical flow. The dataset consists of 23,653 image pairs and corresponding forward and backward optical flow.

6 papers0 benchmarksImages

NLmaps

There are two versions of the NLmaps corpus. NLmaps (v1) and its extension NLmaps v2. Both versions of the NLmaps corpus consist of questions about geographical facts that can be answered with the OpenStreetMap (OSM) database (available under the Open Database Licence). The questions are in English and have a corresponding Machine Readable Language (MRL) parse. Gold answers can be obtained by executing the gold parses against the OSM database using the NLmaps backend, which is based on the Overpass-API (available under the Affero GPL v3).

6 papers0 benchmarksTexts

SocNav1

SocNav1 is a dataset for social navigation conventions. The aims of SocNav1 are two-fold: a) enabling comparison of the algorithms that robots use to assess the convenience of their presence in a particular position when navigating; b) providing a sufficient amount of data so that modern machine learning algorithms such as deep neural networks can be used. Because of the structured nature of the data, SocNav1 is particularly well-suited to be used to benchmark non-Euclidean machine learning algorithms such as Graph Neural Networks

6 papers0 benchmarks

QA-SRL Bank 2.0

QA-SRL Bank 2.0 is a large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations. The corpus consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that was shown to have high precision and good recall at modest cost.

6 papers0 benchmarksTexts

Sketchy

6 papers2 benchmarks

VMRD (Visual Manipulation Relationship Dataset)

VMRD is a multi-object grasp dataset. It has been collected and labeled using hundreds of objects coming from 31 categories. There are totally 5,185 images including 17,688 object instances and 51,530 manipulation relationships.

6 papers0 benchmarksImages

Pull Request Descriptions

This is a dataset of over 333K Pull Requests, used for automatic pull request description generation.

6 papers0 benchmarks

IWSLT2015

The IWSLT 2015 Evaluation Campaign featured three tracks: automatic speech recognition (ASR), spoken language translation (SLT), and machine translation (MT). For ASR we offered two tasks, on English and German, while for SLT and MT a number of tasks were proposed, involving English, German, French, Chinese, Czech, Thai, and Vietnamese. All tracks involved the transcription or translation of TED talks, either made available by the official TED website or by other TEDx events. A notable change with respect to previous evaluations was the use of unsegmented speech in the SLT track in order to better fit a real application scenario.

6 papers0 benchmarks

DroneCrowd

DroneCrowd is a benchmark for object detection, tracking and counting algorithms in drone-captured videos. It is a drone-captured large scale dataset formed by 112 video clips with 33,600 HD frames in various scenarios. Notably, it has annotations for 20,800 people trajectories with 4.8 million heads and several video-level attributes.

6 papers0 benchmarksVideos

Bimanual Actions Dataset

The Bimanual Actions Dataset is a collection of 540 RGB-D videos, showing subjects perform bimanual actions in a kitchen or workshop context. The main purpose for its compilation is to research bimanual human behaviour in order to eventually improve the capabilities of humanoid robots.

6 papers0 benchmarksRGB-D, Videos
PreviousPage 198 of 1000Next