19,997 machine learning datasets
19,997 dataset results
Consists of two pedestrian trajectory datasets, CITR dataset and DUT dataset, so that the pedestrian motion models can be further calibrated and verified, especially when vehicle influence on pedestrians plays an important role.
The Multimodal Document Intent Dataset (MDID) is a dataset for computing author intent from multimodal data from Instagram. It contains 1,299 Instagram posts covering a variety of topics, annotated with labels from three taxonomies. The samples are labelled with 7 labels of intent: Provocative, Informative, Advocative, Entertainment, Expositive, Expressive, Promotive
ADE-Affordance is a new dataset that builds upon ADE20k, which contains annotations enabling such rich visual reasoning.
Large Age-Gap (LAG) is a dataset for face verification, The dataset contains 3,828 images of 1,010 celebrities. For each identity at least one child/young image and one adult/old image are present.
Expanded Groove MIDI dataset (E-GMD) is an automatic drum transcription (ADT) dataset that contains 444 hours of audio from 43 drum kits, making it an order of magnitude larger than similar datasets, and the first with human-performed velocity annotations.
This is a dataset for segmentation and classification of epistemic activities in diagnostic reasoning texts.
Ford Campus Vision and Lidar Data Set is a dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.
MERL Shopping is a dataset for training and testing action detection algorithms. The MERL Shopping Dataset consists of 106 videos, each of which is a sequence about 2 minutes long. The videos are from a fixed overhead camera looking down at people shopping in a grocery store setting. Each video contains several instances of the following 5 actions: "Reach To Shelf" (reach hand into shelf), "Retract From Shelf " (retract hand from shelf), "Hand In Shelf" (extended period with hand in the shelf), "Inspect Product" (inspect product while holding it in hand), and "Inspect Shelf" (look at shelf while not touching or reaching for the shelf).
KITTI is a well established dataset in the computer vision community. It has often been used for trajectory prediction despite not having a well defined split, generating non comparable baselines in different works. This dataset aims at bridging this gap and proposes a well defined split of the KITTI data. Samples are collected as 6 seconds chunks (2seconds for past and 4 for future) in a sliding window fashion from all trajectories in the dataset, including the egovehicle. There are a total of 8613 top-view trajectories for training and 2907 for testing. Since top-view maps are not provided by KITTI, semantic labels of static categories obtained with DeepLab-v3+ from all frames are projected in a common top-view map using the Velodyne 3D point cloud and IMU. The resulting maps have a spatial resolution of 0.5 meters and are provided along with the trajectories.
The second Ninapro database includes 40 intact subjects and it is thoroughly described in the paper: "Manfredo Atzori, Arjan Gijsberts, Claudio Castellini, Barbara Caputo, Anne-Gabrielle Mittaz Hager, Simone Elsig, Giorgio Giatsidis, Franco Bassetto & Henning Müller. Electromyography data for non-invasive naturally-controlled robotic hand prostheses. Scientific Data, 2014" (http://www.nature.com/articles/sdata201453). Please, cite this paper for any work related to the Ninapro database. Please, use also the paper by Gijsberts et al., 2014 (http://publications.hevs.ch/index.php/publications/show/1629) for more information about the database.
The BuzzFeed-Webis Fake News Corpus 16 comprises the output of 9 publishers in a week close to the US elections. Among the selected publishers are 6 prolific hyperpartisan ones (three left-wing and three right-wing), and three mainstream publishers (see Table 1). All publishers earned Facebook’s blue checkmark, indicating authenticity and an elevated status within the network. For seven weekdays (September 19 to 23 and September 26 and 27), every post and linked news article of the 9 publishers was fact-checked by professional journalists at BuzzFeed. In total, 1,627 articles were checked, 826 mainstream, 256 left-wing and 545 right-wing. The imbalance between categories results from differing publication frequencies.
FakeNewsAMT & Celebrity include two novel datasets for the task of fake news detection, covering seven different news domains.
The Individual Brain Charting (IBC) project aims at providing a new generation of functional-brain atlases. To map cognitive mechanisms in a fine scale, task-fMRI data at high-spatial-resolution are being acquired on a fixed cohort of 12 participants, while performing many different tasks. These data—free from both inter-subject and inter-site variability—are publicly available as means to support the investigation of functional segregation and connectivity as well as individual variability with a view to establishing a better link between brain systems and behavior.
This database includes 25 long-term ECG recordings of human subjects with atrial fibrillation (mostly paroxysmal).
The Parsing Time Normalizations (PNT) corpus in SCATE format allows the representation of a wider variety of time expressions than previous approaches. This corpus was release with SemEval 2018 Task 6.
WHU-Specular is a large dataset of annotated specular highlight regions created from real-world images. It can be used for specular highlight detection task. It contains 4310 image pairs (specular images and corresponding highlight masks). We randomly selected 3,017 images as the training set, and other 1293 images as the testing set.
All Words Open IE (AW-OIE) is an open information extraction dataset derived from Question-Answer Meaning Representation (QAMR) dataset.
The RailEye3D dataset, a collection of train-platform scenarios for applications targeting passenger safety and automation of train dispatching, consists of 10 image sequences captured at 6 railway stations in Austria. Annotations for multi-object tracking are provided in both an unified format as well as the ground-truth format used in the MOTChallenge.
Overview This database of simulated arterial pulse waves is designed to be representative of a sample of pulse waves measured from healthy adults. It contains pulse waves for 4,374 virtual subjects, aged from 25-75 years old (in 10 year increments). The database contains a baseline set of pulse waves for each of the six age groups, created using cardiovascular properties (such as heart rate and arterial stiffness) which are representative of healthy subjects at each age group. It also contains 728 further virtual subjects at each age group, in which each of the cardiovascular properties are varied within normal ranges. This allows for extensive in silico analyses of haemodynamics and the performance of pulse wave analysis algorithms.
The “Medico automatic polyp segmentation challenge” aims to develop computer-aided diagnosis systems for automatic polyp segmentation to detect all types of polyps (for example, irregular polyp, smaller or flat polyps) with high efficiency and accuracy. The main goal of the challenge is to benchmark semantic segmentation algorithms on a publicly available dataset, emphasizing robustness, speed, and generalization.