TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

CITR & DUT (CITR dataset and DUT dataset)

Consists of two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role.

3 papers0 benchmarks

MDID (Multimodal Document Intent Dataset)

The Multimodal Document Intent Dataset (MDID) is a dataset for computing author intent from multimodal data from Instagram. It contains 1,299 Instagram posts covering a variety of topics, annotated with labels from three taxonomies. The samples are labelled with 7 labels of intent: Provocative, Informative, Advocative, Entertainment, Expositive, Expressive, Promotive

3 papers0 benchmarksImages, Texts

ADE-Affordance

ADE-Affordance is a new dataset that builds upon ADE20k, which contains annotations enabling such rich visual reasoning.

3 papers0 benchmarksImages, Texts

Large Age-Gap

Large Age-Gap (LAG) is a dataset for face verification, The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.

3 papers0 benchmarksImages

E-GMD (Expanded Groove MIDI Dataset)

Expanded Groove MIDI dataset (E-GMD) is an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations.

3 papers0 benchmarksAudio, Midi

Famulus

This is a dataset for segmentation and classification of epistemic activities in diagnostic reasoning texts.

3 papers0 benchmarksTexts

Ford Campus Vision and Lidar Data Set

Ford Campus Vision and Lidar Data Set is a dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.

3 papers0 benchmarksLiDAR, Point cloud, Videos

MERL Shopping

MERL Shopping is a dataset for training and testing action detection algorithms. The MERL Shopping Dataset consists of 106 videos, each of which is a sequence about 2 minutes long. The videos are from a fixed overhead camera looking down at people shopping in a grocery store setting. Each video contains several instances of the following 5 actions: "Reach To Shelf" (reach hand into shelf), "Retract From Shelf " (retract hand from shelf), "Hand In Shelf" (extended period with hand in the shelf), "Inspect Product" (inspect product while holding it in hand), and "Inspect Shelf" (look at shelf while not touching or reaching for the shelf).

3 papers0 benchmarksVideos

KITTI-trajectory-prediction

KITTI is a well established dataset in the computer vision community. It has often been used for trajectory prediction despite not having a well defined split, generating non comparable baselines in different works. This dataset aims at bridging this gap and proposes a well defined split of the KITTI data. Samples are collected as 6 seconds chunks (2seconds for past and 4 for future) in a sliding window fashion from all trajectories in the dataset, including the egovehicle. There are a total of 8613 top-view trajectories for training and 2907 for testing. Since top-view maps are not provided by KITTI, semantic labels of static categories obtained with DeepLab-v3+ from all frames are projected in a common top-view map using the Velodyne 3D point cloud and IMU. The resulting maps have a spatial resolution of 0.5 meters and are provided along with the trajectories.

3 papers0 benchmarks

NinaPro DB2 (DB2 - 40 Intact Subjects - Delsys Trigno electrodes)

The second Ninapro database includes 40 intact subjects and it is thoroughly described in the paper: "Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto & Henning Müller. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Scientific Data, 2014" (http://www.nature.com/articles/sdata201453). Please, cite this paper for any work related to the Ninapro database. Please, use also the paper by Gijsberts et al., 2014 (http://publications.hevs.ch/index.php/publications/show/1629) for more information about the database.

3 papers0 benchmarksBiomedical, Medical, Time series

BuzzFeed-Webis Fake News Corpus 2016

The BuzzFeed-Webis Fake News Corpus 16 comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.

3 papers0 benchmarksTexts

FakeNewsAMT & Celebrity

FakeNewsAMT & Celebrity include two novel datasets for the task of fake news detection, covering seven different news domains.

3 papers0 benchmarksTexts

IBC (Individual Brain Charting)

The Individual Brain Charting (IBC) project aims at providing a new generation of functional-brain atlases. To map cognitive mechanisms in a fine scale, task-fMRI data at high-spatial-resolution are being acquired on a fixed cohort of 12 participants, while performing many different tasks. These data—free from both inter-subject and inter-site variability—are publicly available as means to support the investigation of functional segregation and connectivity as well as individual variability with a view to establishing a better link between brain systems and behavior.

3 papers0 benchmarksMRI, Medical, fMRI

MIT-BIH AFDB (MIT-BIH Atrial Fibrilation Database)

This database includes 25 long-term ECG recordings of human subjects with atrial fibrillation (mostly paroxysmal).

3 papers0 benchmarksMedical, Time series

PNT (Parsing Time Normalizations)

The Parsing Time Normalizations (PNT) corpus in SCATE format allows the representation of a wider variety of time expressions than previous approaches. This corpus was release with SemEval 2018 Task 6.

3 papers1 benchmarksTexts

WHU-Specular dataset

WHU-Specular is a large dataset of annotated specular highlight regions created from real-world images. It can be used for specular highlight detection task. It contains 4310 image pairs (specular images and corresponding highlight masks). We randomly selected 3,017 images as the training set, and other 1293 images as the testing set.

3 papers0 benchmarksImages

AW-OIE (All Words OpenIE)

All Words Open IE (AW-OIE) is an open information extraction dataset derived from Question-Answer Meaning Representation (QAMR) dataset.

3 papers0 benchmarksTexts

RailEye3D Dataset

The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.

3 papers0 benchmarksImages

PWDB (Pulse Wave Database)

Overview This database of simulated arterial pulse waves is designed to be representative of a sample of pulse waves measured from healthy adults. It contains pulse waves for 4,374 virtual subjects, aged from 25-75 years old (in 10 year increments). The database contains a baseline set of pulse waves for each of the six age groups, created using cardiovascular properties (such as heart rate and arterial stiffness) which are representative of healthy subjects at each age group. It also contains 728 further virtual subjects at each age group, in which each of the cardiovascular properties are varied within normal ranges. This allows for extensive in silico analyses of haemodynamics and the performance of pulse wave analysis algorithms.

3 papers0 benchmarksBiology, Biomedical, Medical, Time series

Medico automatic polyp segmentation challenge (dataset)

The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.

3 papers5 benchmarksBiomedical, Images, Medical
PreviousPage 265 of 1000Next