19,997 machine learning datasets
19,997 dataset results
The INRIA Dense Light Field Dataset (DLFD) is a dataset for testing depth estimation methods in a light field. DLFD contains 39 scenes with disparity range [-4,4] pixels. The light fields are of spatial resolution 512 x 512 and angular resolution 9 x 9.
The INRIA Sprse Light Field Dataset (SLFD) is a dataset for testing depth estimation methods in a light field. SLFD contains 53 scenes with disparity range [-20,20] pixels. The light fields are of spatial resolution 512 x 512 and angular resolution 9 x 9.
The AIDS Antiviral Screen dataset is a dataset of screens checking tens of thousands of compounds for evidence of anti-HIV activity. The available screen results are chemical graph-structured data of these various compounds.
The Daimler Monocular Pedestrian Detection dataset is a dataset for pedestrian detection in urban environments. The training set contains 15560 pedestrian samples (image cut-outs at 48×96 resolution) and 6744 additional full images without pedestrians for extracting negative samples. The test set contains an independent sequence with more than 21790 images and 56492 pedestrian labels (fully visible or partially occluded), captured from a vehicle during a 27 min driving through the urban traffic.
The ETHZ Shape dataset contains images of five diverse shape-based classes, collected from Flickr and Google Images. The main challenges it offers are clutter, intra-class shape variability, and scale changes. The authors deliberately selected several images where the object comprises only a rather small portion of the image, and made an effort to include objects appearing at a wide range of scales. The objects are mostly unoccluded and are all taken from approximately the same viewpoint (the side).
Short BBC Pose contains five one-hour-long videos with sign language signers each with different sleeve length (in contrast to the BBC pose and Extended BBC Pose, which only contain signers with moderately long sleeves). Each of the five videos has 200 test frames (which have been manually annotated with joint locations), amounting to 1,000 test frames in total. Test frames were selected by the authors to contain a diverse range of poses.
The CAL500 Expansion (CAL500exp) dataset is an enriched version of the CAL500 music information retrieval dataset. CAL500exp is designed to facilitate music auto-tagging in a smaller temporal scale. The dataset consists of the same songs split into 3,223 acoustically homogenous segments of 3 to 16 seconds. The tag labels are annotated in the segment level instead of track level. The annotations were obtained from annotators with strong music background.
The CAL10K dataset (introduced as Swat10k) contains 10,870 songs that are weakly-labelled using a tag vocabulary of 475 acoustic tags and 153 genre tags. The tags have all been harvested from Pandora’s website and result from song annotations performed by expert musicologists involved with the Music Genome Project.
The IT Translation Task is a shared task introduced in the First Conference on Machine Translation. Compared to WMT 2016 News, this task brought several novelties to WMT:
The Biomedical Translation Shared Task was first introduced at the First Conference of Machine Translation. The task aims to evaluate systems for the translation of biomedical titles and abstracts from scientific publications. The data includes three language pairs (English ↔ Portuguese, English ↔ Spanish, English ↔ French) and two sub-domains of biological sciences and health sciences.
The Medical Translation Task of WMT 2014 addresses the problem of domain-specific and genre-specific machine translation. The task is split into two subtasks: summary translation, focused on translation of sentences from summaries of medical articles, and query translation, focused on translation of queries entered by users into medical information search engines. Both subtasks included translation between English and Czech, German, and French, in both directions.
News translation is a recurring WMT task. The test set is a collection of parallel corpora consisting of about 1500 English sentences translated into 5 languages (Czech, German, Finnish, French, Russian) and additional 1500 sentences from each of the 5 languages translated to English. The sentences are taken from newspaper articles for each language pair, except for French, where the test set was drawn from user-generated comments on the news articles (from Guardian and Le Monde). The translation was done by professional translators.
The LinkedResults dataset contains around 1,600 results capturing performance of machine learning models from tables of 239 papers. All tables come from a subset of SegmentedTables dataset. Each result is a tuple of form (task, dataset, metric name, metric value) and is linked to a particular table, row and cell it originates from.
The VGG Cell dataset (made up entirely of synthetic images) is the main public benchmark used to compare cell counting techniques.
SceneNet-RGBD is a synthetic dataset containing large-scale photorealistic renderings of indoor scene trajectories with pixel-level annotations. Random sampling permits virtually unlimited scene configurations, and the dataset creators provide a set of 5M rendered RGB-D images from over 15K trajectories in synthetic layouts with random but physically simulated object poses. Each layout also has random lighting, camera trajectories, and textures. The scale of this dataset is well suited for pre-training data-driven computer vision techniques from scratch with RGB-D inputs, which previously has been limited by relatively small labelled datasets in NYUv2 and SUN RGB-D. It also provides a basis for investigating 3D scene labelling tasks by providing perfect camera poses and depth data as proxy for a SLAM system.
The Freiburg Street Crossing dataset consists of data collected from three different street crossings in Freiburg, Germany; ; two of which were traffic light regulated intersections and one a zebra crossing without traffic lights. The data can be used to train agents to cross roads autonomously.
Freiburg Terrains consist of three parts: 3.7 hours of audio recordings of the microphone pointed at the robot wheels. It also contains 24K RGB images from the camera mounted on top of the robot. The dataset creators also provide the SLAM poses for each data collection run. The dataset can be used for terrain classification which is useful for agent navigation tasks.
DeepLocCross is a localization dataset that contains RGB-D stereo images captured at 1280 x 720 pixels at a rate of 20 Hz. The ground-truth pose labels are generated using a LiDAR-based SLAM system. In addition to the 6-DoF localization poses of the robot, the dataset additionally contains tracked detections of the observable dynamic objects. Each tracked object is identified using a unique track ID, spatial coordinates, velocity and orientation angle. Furthermore, as the dataset contains multiple pedestrian crossings, labels at each intersection indicating its safety for crossing are provided. This dataset consists of seven training sequences with a total of 2264 images, and three testing sequences with a total of 930 images. The dynamic nature of the surrounding environment at which the dataset was captured renders the tasks of localization and visual odometry estimation extremely challenging due to the varying weather conditions, presence of shadows and motion blur caused by the mov
The Caltech Resident-Intruder Mouse dataset (CRIM13) consists of 237x2 videos (recorded with synchronized top and side view) of pairs of mice engaging in social behavior, catalogued into thirteen different actions. Each video lasts ~10min, for a total of 88 hours of video and 8 million frames. A team of behavior experts annotated each video on a frame-by-frame basis for a state-of-the-art study of the neurophysiological mechanisms involved in aggression and courtship in mice.
NIST Special Database 19 contains NIST's entire corpus of training materials for handprinted document and character recognition. It publishes Handprinted Sample Forms from 3600 writers, 810,000 character images isolated from their forms, ground truth classifications for those images, reference forms for further data collection, and software utilities for image management and handling.