TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Poser

The Poser dataset is a dataset for pose estimation which consists of 1927 training and 418 test images. These images are synthetically generated and tuned to unimodal predictions. The images were generated using the Poser software package.

2 papers0 benchmarksImages

Retinal Microsurgery

The Retinal Microsurgery dataset is a dataset for surgical instrument tracking. It consists of 18 in-vivo sequences, each with 200 frames of resolution 1920 × 1080 pixels. The dataset is further classified into four instrument-dependent subsets. The annotated tool joints are n=3 and semantic classes c=2 (tool and background).

2 papers0 benchmarksImages

L-Bird (Large-Bird)

The L-Bird (Large-Bird) dataset contains nearly 4.8 million images which are obtained by searching images of a total of 10,982 bird species from the Internet.

2 papers0 benchmarksImages

MuseScore

The MuseScore dataset is a collection of 344,166 audio and MIDI pairs downloaded from MuseScore website. The audio is usually synthesized by the MuseScore synthesizer. The audio clips have diverse musical genres and are about two mins long on average.

2 papers0 benchmarksAudio

LibriCount

LibriCount is a synthetic dataset for speaker count estimation. The dataset contains a simulated cocktail party environment of 0 to 10 speakers, mixed with 0dB SNR from random utterances of different speakers from the LibriSpeech CleanTest dataset. All recordings are of 5s durations, and all speakers are active for the most part of the recording.

2 papers0 benchmarksAudio

AG’s Corpus (AG's corpus of news articlesNews)

Antonio Gulli’s corpus of news articles is a collection of more than 1 million news articles. The articles have been gathered from more than 2000 news sources by ComeToMyHead in more than 1 year of activity. ComeToMyHead is an academic news search engine which has been running since July, 2004. The dataset is provided by the academic comunity for research purposes in data mining (clustering, classification, etc), information retrieval (ranking, search, etc), xml, data compression, data streaming, and any other non - commercial activity.

2 papers0 benchmarksTexts

ArxivPapers

The ArxivPapers dataset is an unlabelled collection of over 104K papers related to machine learning and published on arXiv.org between 2007–2020. The dataset includes around 94K papers (for which LaTeX source code is available) in a structured form in which paper is split into a title, abstract, sections, paragraphs and references. Additionally, the dataset contains over 277K tables extracted from the LaTeX papers.

2 papers0 benchmarksTables, Texts

SegmentedTables

The SegmentedTables dataset is a collection of almost 2,000 tables extracted from 352 machine learning papers. Each table consists of rich text content, layout and caption. Tables are annotated with types (leaderboard, ablation, irrelevant) and cells of relevant tables are annotated with semantic roles (such as “paper model”, “competing model”, “dataset”, “metric”).

2 papers0 benchmarksTables, Texts

DispScenes

The DispScenes dataset was created to address the specific problem of disparate image matching. The image pairs in all the datasets exhibit high levels of variation in illumination and viewpoint and also contain instances of occlusion. The DispScenes dataset provides manual ground truth keypoint correspondences for all images.

2 papers0 benchmarksImages

Freiburg Spatial Relations

The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The depicted spatial relations can roughly be described as on top, on top on the corner, inside, inside and inclined, next to, and inclined. The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.

2 papers0 benchmarksPoint cloud

StarData

StarData is a StarCraft: Brood War replay dataset, with 65,646 games. The full dataset after compression is 365 GB, 1535 million frames, and 496 million player actions. The entire frame data was dumped out at 8 frames per second.

2 papers0 benchmarksActions, Replay data

Spades (Semantic PArsing of DEclarative Sentences)

Datasets Spades contains 93,319 questions derived from clueweb09 sentences. Specifically, the questions were created by randomly removing an entity, thus producing sentence-denotation pairs.

2 papers0 benchmarksTexts

Chinese Gigaword

Chinese Gigaword corpus consists of 2.2M of headline-document pairs of news stories covering over 284 months from two Chinese newspapers, namely the Xinhua News Agency of China (XIN) and the Central News Agency of Taiwan (CNA).

2 papers0 benchmarksTexts

Stanford-ECM

Stanford-ECM is an egocentric multimodal dataset which comprises about 27 hours of egocentric video augmented with heart rate and acceleration data. The lengths of the individual videos cover a diverse range from 3 minutes to about 51 minutes in length. A mobile phone was used to collect egocentric video at 720x1280 resolution and 30 fps, as well as triaxial acceleration at 30Hz. The mobile phone was equipped with a wide-angle lens, so that the horizontal field of view was enlarged from 45 degrees to about 64 degrees. A wrist-worn heart rate sensor was used to capture the heart rate every 5 seconds. The phone and heart rate monitor was time-synchronized through Bluetooth, and all data was stored in the phone’s storage. Piecewise cubic polynomial interpolation was used to fill in any gaps in heart rate data. Finally, data was aligned to the millisecond level at 30 Hz.

2 papers0 benchmarksAudio, Videos

MSRA10K (MSRA10K Salient Object Database)

MSRA10K is a dataset for salient object detection that contains 10,000 images with pixel-level saliency labeling for 10K images from the MSRA salient object detection dataset. The original MRSA database provides salient object annotation in terms of bounding boxes provided by 3-9 users.

2 papers0 benchmarksImages

MUSE

The MUSE dataset contains bilingual dictionaries for 110 pairs of languages. For each language pair, the training seed dictionaries contain approximately 5000 word pairs while the evaluation sets contain 1500 word pairs.

2 papers0 benchmarksTexts

ICL-NUIM

The ICL-NUIM dataset aims at benchmarking RGB-D, Visual Odometry and SLAM algorithms. Two different scenes (the living room and the office room scene) are provided with ground truth. Living room has 3D surface ground truth together with the depth-maps as well as camera poses and as a result perfectly suits not just for benchmarking camera trajectory but also reconstruction. Office room scene comes with only trajectory data and does not have any explicit 3D model with it.

2 papers0 benchmarksImages, RGB-D

Middlebury MVS

Middlebury MVS is the earliest MVS dataset for multi-view stereo network evaluation. It contains two indoor objects with low-resolution (640 × 480) images and calibrated cameras.

2 papers0 benchmarksStereo

Middlebury 2003

Middlebury 2003 is a stereo dataset for indoor scenes.

2 papers0 benchmarksImages, Stereo

URBAN-SED

URBAN-SED is a dataset of 10,000 soundscapes with sound event annotations generated using the scraper library. The dataset includes 10,000 soundscapes, totals almost 30 hours and includes close to 50,000 annotated sound events. Every soundscape is 10 seconds long and has a background of Brownian noise resembling the typical “hum” often heard in urban environments. Every soundscape contains between 1-9 sound evnts from the following classes: air_conditioner, car_horn, children_playing, dog_bark, drilling, engine_idling, gun_shot, jackhammer, siren and street_music. The source material for the sound events are the clips from the UrbanSound8K dataset. URBAN-SED comes pre-sorted into three sets: train, validate and test. There are 6000 soundscapes in the training set, generated using clips from folds 1-6 in UrbanSound8K, 2000 soundscapes in the validation set, generated using clips from fold 7-8 in UrbanSound8K, and 2000 soundscapes in the test set, generated using clips from folds 9-10 in

2 papers0 benchmarksAudio
PreviousPage 297 of 1000Next