TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Crowd Dataset

A dense crowd dataset with manually annotated groundtruth, collected from different public datasets. This dataset comprises 20 videos that exhibit a multitude of motion behaviors that cover both the obvious and subtle instabilities.

1 papers0 benchmarks

CrowdFix

Contributes dataset: (1) reviewing the dynamics behind saliency and crowds. (2) using eye tracking to create a dynamic human eye fixation dataset over a new set of crowd videos gathered from the Internet. The videos are annotated into three distinct density levels.

1 papers0 benchmarks

CUHK-QA

CUHK-QA is a dataset for natural language-based person search using iterative questioning.

1 papers0 benchmarksImages, Texts

Curated AFD

The Curated AFD dataset is a curated version of the Asian Face Dataset (AFD) for face recognition research. The original AFD dataset has been curated to remove wrong identity labels, duplicate images and duplicate subjects.

1 papers0 benchmarksImages

Curation Corpus

The Curation Corpus is a collection of 40,000 professionally-written summaries of news articles, with links to the articles themselves.

1 papers1 benchmarks

CzEng 2.0 Parallel Corpus

Czech-English parallel corpus CzEng 2.0 consisting of over 2 billion words (2 "gigawords") in each language. The corpus contains document-level information and is filtered with several techniques to lower the amount of noise.

1 papers0 benchmarksTexts

D2City

A large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D2-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China.

1 papers0 benchmarksVideos

DAIS

A large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments.

1 papers0 benchmarksTexts

Da Vinci Dataset

A line drawing restoration dataset which consists of 71 line drawing sketches by Leonardo Da Vinci.

1 papers0 benchmarks

DBpedia NIF

The dataset provides the content of all articles for 128 Wikipedia languages. The dataset has been further enriched with about 25% more links and selected partitions published as Linked Data.

1 papers0 benchmarksTexts

DeSMOG

A dataset of stance-labeled GW sentences.

1 papers0 benchmarks

DET

DET is a lane detection dataset that consists of the raw event data, accumulated images over 30ms and corresponding lane labels. Contains 17,103 lane instances, each of which is labeled pixel by pixel manually.

1 papers4 benchmarks

Diabetic Foot Ulcers Classification Datasets (DTU)

Contains Diabetic Foot Ulcers (DFU) from different patients.

1 papers0 benchmarks

Diseases in Neurology Case Reports Dataset

Extracts diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017.

1 papers0 benchmarks

DLBCL-Morph

DLBCL-Morph is a dataset containing 42 digitally scanned high-resolution tissue microarray (TMA) slides accompanied by clinical, cytogenetic, and geometric features from 209 DLBCL cases.

1 papers0 benchmarksMedical

DpgMedia2019

DpgMedia2019 is a Dutch news dataset for partisanship detection. It contains more than 100K articles that are labelled on the publisher level and 776 articles that were crowdsourced using an internal survey platform and labelled on the article level.

1 papers0 benchmarksTexts

Drone Tracking

This dataset contains videos where a flying drone (hexacopter) is captured with multiple consumer-grade cameras (smartphones, compact cameras, gopro,...) with highly accurate 3D drone trajectory ground truth recorderd by a precise real-time RTK system from Fixposition. In some videos, the ground truth temporal synchronization and ground truth camera locations are also provided.

1 papers0 benchmarksImages

Edge-Map-345C

Edge-Map-345C is a large-scale edge-map dataset including 290,281 edge-maps corresponding to 345 object categories of QuickDraw dataset. In particular, these 345 categories are corresponding to the 345 free-hand sketch categories of Google QuickDraw dataset.

1 papers0 benchmarksImages

Edina-DR

Edina-DR is a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse.

1 papers0 benchmarks

Egyptian Arabic Segmentation Dataset

Contains 350 tweets with more than 8,000 words including 3,000 unique words written in Egyptian dialect. The tweets have much dialectal content covering most of dialectal Egyptian phonological, morphological, and syntactic phenomena. It also includes Twitter-specific aspects of the text, such as #hashtags, @mentions, emoticons and URLs.

1 papers0 benchmarks
PreviousPage 368 of 1000Next