TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets

19,997 machine learning datasets

Filter by Modality

  • Images3,275
  • Texts3,148
  • Videos1,019
  • Audio486
  • Medical395
  • 3D383
  • Time series298
  • Graphs285
  • Tabular271
  • Speech199
  • RGB-D192
  • Environment148
  • Point cloud135
  • Biomedical123
  • LiDAR95
  • RGB Video87
  • Tracking78
  • Biology71
  • Actions68
  • 3d meshes65
  • Tables52
  • Music48
  • EEG45
  • Hyperspectral images45
  • Stereo44
  • MRI39
  • Physics32
  • Interactive29
  • Dialog25
  • Midi22
  • 6D17
  • Replay data11
  • Financial10
  • Ranking10
  • Cad9
  • fMRI7
  • Parallel6
  • Lyrics2
  • PSG2

19,997 dataset results

Fluent Speech Commands

Fluent Speech Commands is an open source audio dataset for spoken language understanding (SLU) experiments. Each utterance is labeled with "action", "object", and "location" values; for example, "turn the lights on in the kitchen" has the label {"action": "activate", "object": "lights", "location": "kitchen"}. A model must predict each of these values, and a prediction for an utterance is deemed to be correct only if all values are correct.

57 papers3 benchmarksAudio

Dark Zurich

Dark Zurich is an image dataset containing a total of 8779 images captured at nighttime, twilight, and daytime, along with the respective GPS coordinates of the camera for each image. These GPS annotations are used to construct cross-time-of-day correspondences, i.e., to match each nighttime or twilight image to its daytime counterpart.

57 papers3 benchmarksImages

Assembly101

Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.

57 papers48 benchmarksVideos

MSLS (Mapillary Street-level Sequences Dataset)

The largest and most diverse dataset for lifelong place recognition from image sequences in urban and suburban settings.

57 papers2 benchmarks

Open-X-Embodiment

Open-X-Embodiment robot manipulation dataset, see https://robotics-transformer-x.github.io/

57 papers0 benchmarks

VOT2017 (Visual Object Tracking Challenge)

VOT2017 is a Visual Object Tracking dataset for different tasks that contains 60 short sequences annotated with 6 different attributes.

56 papers2 benchmarksTracking, Videos

PA-100K (PA-100K Dataset)

PA-100K is a recent-proposed large pedestrian attribute dataset, with 100,000 images in total collected from outdoor surveillance cameras. It is split into 80,000 images for the training set, and 10,000 for the validation set and 10,000 for the test set. This dataset is labeled by 26 binary attributes. The common features existing in both selected dataset is that the images are blurry due to the relatively low resolution and the positive ratio of each binary attribute is low.

56 papers6 benchmarksImages

RTE (Recognizing Textual Entailment)

The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and RTE5 is combined. Examples are constructed based on news and Wikipedia text.

56 papers2 benchmarksTexts

CoNLL++

CoNLL++ is a corrected version of the CoNLL03 NER dataset where 5.38% of the test sentences have been fixed.

56 papers1 benchmarks

UCI Machine Learning Repository

UCI Machine Learning Repository is a collection of over 550 datasets.

56 papers0 benchmarks

ETH-XGaze

Consists of over one million high-resolution images of varying gaze under extreme head poses. The dataset is collected from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets.

56 papers1 benchmarksImages

MINC (Materials in Context Database)

MINC is a large-scale, open dataset of materials in the wild.

56 papers0 benchmarksImages, Texts

N-CARS

A large real-world event-based dataset for object classification.

56 papers7 benchmarks

MasakhaNER

MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.

56 papers0 benchmarksTexts

Re-TACRED (Revised-TACRED)

The Re-TACRED dataset is a significantly improved version of the TACRED dataset for relation extraction. Using new crowd-sourced labels, Re-TACRED prunes poorly annotated sentences and addresses TACRED relation definition ambiguity, ultimately correcting 23.9% of TACRED labels. This dataset contains over 91 thousand sentences spread across 40 relations. Dataset presented at AAAI 2021.

56 papers1 benchmarksTexts

GazeCapture (Eye Tracking for Everyone)

From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost $2.5M$ frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10 - 15fps) on a modern mobile device. Our model achieves a prediction error of 1.7cm and 2.5cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.3cm and 2.1cm. Fu

56 papers2 benchmarks

Adult

Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))

56 papers3 benchmarksTabular

Road Anomaly

This dataset contains images of unusual dangers which can be encountered by a vehicle on the road – animals, rocks, traffic cones and other obstacles. Its purpose is testing autonomous driving perception algorithms in rare but safety-critical circumstances.

56 papers2 benchmarksImages

BEAT (Body-Expression-Audio-Text)

BEAT has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations. Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with \textit{facial expressions}, \textit{emotions}, and \textit{semantics}, in addition to the known correlation with \textit{audio}, \textit{text}, and \textit{speaker identity}. Based on this observation, we propose a baseline model, \textbf{Ca}scaded \textbf{M}otion \textbf{N}etwork \textbf{(CaMN)}, which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. To evaluate the semantic relevancy, we introduce a metric, Semantic Relevance Gesture Recall (\textbf{SRGR}). Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge,

56 papers2 benchmarks3D, 3d meshes, Actions, Audio, Speech, Texts

Jobs

The Jobs dataset by LaLonde [36] is a widely used benchmark in the causal inference community, where the treatment is job training and the outcomes are income and employment status after training. The dataset includes 8 covariates such as age, education, and previous earnings. Our goal is to predict unemployment, using the feature set of Dehejia and Wahba [37]. Following Shalit et al. 8, we combined the LaLonde experimental sample (297 treated, 425 control) with the PSID comparison group (2490 control).

56 papers1 benchmarks
PreviousPage 52 of 1000Next