19,997 machine learning datasets
19,997 dataset results
Fluent Speech Commands is an open source audio dataset for spoken language understanding (SLU) experiments. Each utterance is labeled with "action", "object", and "location" values; for example, "turn the lights on in the kitchen" has the label {"action": "activate", "object": "lights", "location": "kitchen"}. A model must predict each of these values, and a prediction for an utterance is deemed to be correct only if all values are correct.
Dark Zurich is an image dataset containing a total of 8779 images captured at nighttime, twilight, and daytime, along with the respective GPS coordinates of the camera for each image. These GPS annotations are used to construct cross-time-of-day correspondences, i.e., to match each nighttime or twilight image to its daytime counterpart.
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles. Participants work without fixed instructions, and the sequences feature rich and natural variations in action ordering, mistakes, and corrections. Assembly101 is the first multi-view action dataset, with simultaneous static (8) and egocentric (4) recordings. Sequences are annotated with more than 100K coarse and 1M fine-grained action segments, and 18M 3D hand poses. We benchmark on three action understanding tasks: recognition, anticipation and temporal segmentation. Additionally, we propose a novel task of detecting mistakes. The unique recording format and rich set of annotations allow us to investigate generalization to new toys, cross-view transfer, long-tailed distributions, and pose vs. appearance. We envision that Assembly101 will serve as a new challenge to investigate various activity understanding problems.
The largest and most diverse dataset for lifelong place recognition from image sequences in urban and suburban settings.
Open-X-Embodiment robot manipulation dataset, see https://robotics-transformer-x.github.io/
VOT2017 is a Visual Object Tracking dataset for different tasks that contains 60 short sequences annotated with 6 different attributes.
PA-100K is a recent-proposed large pedestrian attribute dataset, with 100,000 images in total collected from outdoor surveillance cameras. It is split into 80,000 images for the training set, and 10,000 for the validation set and 10,000 for the test set. This dataset is labeled by 26 binary attributes. The common features existing in both selected dataset is that the images are blurry due to the relatively low resolution and the positive ratio of each binary attribute is low.
The Recognizing Textual Entailment (RTE) datasets come from a series of textual entailment challenges. Data from RTE1, RTE2, RTE3 and RTE5 is combined. Examples are constructed based on news and Wikipedia text.
CoNLL++ is a corrected version of the CoNLL03 NER dataset where 5.38% of the test sentences have been fixed.
UCI Machine Learning Repository is a collection of over 550 datasets.
Consists of over one million high-resolution images of varying gaze under extreme head poses. The dataset is collected from 110 participants with a custom hardware setup including 18 digital SLR cameras and adjustable illumination conditions, and a calibrated system to record ground truth gaze targets.
MINC is a large-scale, open dataset of materials in the wild.
A large real-world event-based dataset for object classification.
MasakhaNER is a collection of Named Entity Recognition (NER) datasets for 10 different African languages. The languages forming this dataset are: Amharic, Hausa, Igbo, Kinyarwanda, Luganda, Luo, Nigerian-Pidgin, Swahili, Wolof, and Yorùbá.
The Re-TACRED dataset is a significantly improved version of the TACRED dataset for relation extraction. Using new crowd-sourced labels, Re-TACRED prunes poorly annotated sentences and addresses TACRED relation definition ambiguity, ultimately correcting 23.9% of TACRED labels. This dataset contains over 91 thousand sentences spread across 40 relations. Dataset presented at AAAI 2021.
From scientific research to commercial applications, eye tracking is an important tool across many domains. Despite its range of applications, eye tracking has yet to become a pervasive technology. We believe that we can put the power of eye tracking in everyone's palm by building eye tracking software that works on commodity hardware such as mobile phones and tablets, without the need for additional sensors or devices. We tackle this problem by introducing GazeCapture, the first large-scale dataset for eye tracking, containing data from over 1450 people consisting of almost $2.5M$ frames. Using GazeCapture, we train iTracker, a convolutional neural network for eye tracking, which achieves a significant reduction in error over previous approaches while running in real time (10 - 15fps) on a modern mobile device. Our model achieves a prediction error of 1.7cm and 2.5cm without calibration on mobile phones and tablets respectively. With calibration, this is reduced to 1.3cm and 2.1cm. Fu
Data Set Information: Extraction was done by Barry Becker from the 1994 Census database. A set of reasonably clean records was extracted using the following conditions: ((AAGE>16) && (AGI>100) && (AFNLWGT>1)&& (HRSWK>0))
This dataset contains images of unusual dangers which can be encountered by a vehicle on the road – animals, rocks, traffic cones and other obstacles. Its purpose is testing autonomous driving perception algorithms in rare but safety-critical circumstances.
BEAT has i) 76 hours, high-quality, multi-modal data captured from 30 speakers talking with eight different emotions and in four different languages, ii) 32 millions frame-level emotion and semantic relevance annotations. Our statistical analysis on BEAT demonstrates the correlation of conversational gestures with \textit{facial expressions}, \textit{emotions}, and \textit{semantics}, in addition to the known correlation with \textit{audio}, \textit{text}, and \textit{speaker identity}. Based on this observation, we propose a baseline model, \textbf{Ca}scaded \textbf{M}otion \textbf{N}etwork \textbf{(CaMN)}, which consists of above six modalities modeled in a cascaded architecture for gesture synthesis. To evaluate the semantic relevancy, we introduce a metric, Semantic Relevance Gesture Recall (\textbf{SRGR}). Qualitative and quantitative experiments demonstrate metrics' validness, ground truth data quality, and baseline's state-of-the-art performance. To the best of our knowledge,
The Jobs dataset by LaLonde [36] is a widely used benchmark in the causal inference community, where the treatment is job training and the outcomes are income and employment status after training. The dataset includes 8 covariates such as age, education, and previous earnings. Our goal is to predict unemployment, using the feature set of Dehejia and Wahba [37]. Following Shalit et al. 8, we combined the LaLonde experimental sample (297 treated, 425 control) with the PSID comparison group (2490 control).