TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

3D-ZeF (3D ZebraFish Tracking Benchmark)

3D-ZeF dataset consists of eight sequences with a duration between 15-120 seconds and 1-10 free moving zebrafish. The videos have been annotated with a total of 86,400 points and bounding boxes.

1 papers0 benchmarks

AAVE/SAE Paired Dataset

AAVE/SAE Paired Dataset contains 2019 intent-equivalent AAVE/SAE pairs. The AAVE (African-American Vernacular English) samples are sampled from Blodgett et. al. (2016)'s TwitterAAE, with their corresponding SAE (Standard American English) samples annotated by Amazon MTurk.

1 papers0 benchmarksTexts

ActioNet

ActioNet is a video task-based dataset collected in a synthetic 3D environment. It contains 3,038 annotated videos and hierarchical task structures over 65 individual household tasks from 120 different scenes. Each task is annotated across three to five different scenes by 10 different annotators. The tasks can be broken down into four categories: living room, bedroom, bathroom, kitchen.

1 papers0 benchmarksVideos

ActivityNet Thumbnails

Consists of 10,000+ video-sentence pairs with each accompanied by an annotated sentence specified video thumbnail.

1 papers0 benchmarksVideos

Advice Seeking Questions

The Advice-Seeking Questions (ASQ) dataset is a collection of personal narratives with advice-seeking questions. The dataset has been split into train, test, heldout sets, with 8865, 2500, 10000 test instances each. This dataset is used to train and evaluate methods that can infer what is the advice-seeking goal behind a personal narrative. This task is formulated as a cloze test, where the goal is to identify which of two advice-seeking questions was removed from a given narrative.

1 papers0 benchmarksTexts

AGRR-2019

Consists of 7.5k sentences with gapping (as well as 15k relevant negative sentences) and comprises data from various genres: news, fiction, social media and technical texts. The dataset was prepared for the Automatic Gapping Resolution Shared Task for Russian (AGRR-2019) - a competition aimed at stimulating the development of NLP tools and methods for processing of ellipsis.

1 papers0 benchmarksTexts

Alexa Point of View

The Alexa Point of View dataset is point of view conversion dataset, a parallel corpus of messages spoken to a virtual assistant and the converted messages for delivery. The dataset contains parallel corpus of input (input column) message and POV converted messages (output column). An example of a pair is tell @CN@ that i'll be late [\t] hi @CN@, @SCN@ would like you to know that they'll be late. The input and pov-converted output pair is tab separated. @CN@ tag is a placeholder for the contact name (receiver) and @SCN@ tag is a placeholder for source contact name (sender). The total dataset has 46563 pairs. This data is then test/train/dev split into 6985 pairs/32594 pairs/6985 pairs.

1 papers1 benchmarksTexts

AML Robot Cutting Dataset

The AML Robot Cutting Dataset consists of approximately 1500 seconds of real data collected on Kinova Jaco 2 robot retrofitted with a custom end-effector fixture and dremel performing cutting tasks on wood specimens for 5 materials and 5 thicknesses.

1 papers0 benchmarks

ApartmenTour

Contains a large number of online videos and subtitles.

1 papers0 benchmarksTexts, Videos

Armenian Paraphrase Detection Corpus

This dataset contains 2,360 paraphrases in Armenian that can be used for paraphrase detection. The dataset is constructed by back-translating sentences from Armenian to English twice, and manually filtering the result.

1 papers0 benchmarksTexts

Arxiv Academic Paper Dataset

A dataset to enable automatic academic paper rating.

1 papers0 benchmarks

AskParents

AskParents is a dataset for advice classification extracted from Reddit. In this dataset, posts are annotated for whether they contain advice or not. It contains 8,701 samples for training, 802 for validation and 1,091 for testing.

1 papers0 benchmarksTexts

Astyx HiRes2019

A radar-centric automotive dataset based on radar, lidar and camera data for the purpose of 3D object detection.

1 papers0 benchmarks

Automating Dynamic Consent

This dataset is used to evaluate a predictive consent model for users’ information shared in social media. In this task, the goal is to predict whether the users will give their consent to share that data with different hypothetical audiences within a medical context. The dataset is built from information the users posted on Facebook and their consent answers about each piece of information.

1 papers0 benchmarksTexts

AuxAD

AuxAD is a a distantly supervised dataset for acronym disambiguation.

1 papers0 benchmarksTexts

AVECL-UMons

A dataset for audio-visual event classification and localization in the context of office environments. The audio-visual dataset is composed of 11 event classes recorded at several realistic positions in two different rooms. Two types of sequences are recorded according to the number of events in the sequence. The dataset comprises 2662 unilabel sequences and 2724 multilabel sequences corresponding to a total of 5.24 hours.

1 papers0 benchmarksVideos

BCWS (Bilingual Contextual Word Similarity)

Dataset for evaluating English-Chinese Bilingual Contextual Word Similarity. The dataset consists of 2,091 English-Chinese word pairs with the corresponding sentential contexts and their similarity scores annotated by the human.

1 papers0 benchmarksTexts

Berkeley DeepDrive Video

A dataset comprised of real driving videos and GPS/IMU data. The BDDV dataset contains diverse driving scenarios including cities, highways, towns, and rural areas in several major cities in US.

1 papers0 benchmarks

Bianet

Bianet is a parallel news corpus in Turkish, Kurdish and English It contains 3,214 Turkish articles with their sentence-aligned Kurdish or English translations from the Bianet online newspaper.

1 papers0 benchmarksTexts

BigHand2.2M Benchmark

A large-scale hand pose dataset, collected using a novel capture method.

1 papers0 benchmarksImages
PreviousPage 366 of 1000Next