19,997 machine learning datasets
19,997 dataset results
Comprises of 171,191 video segments from 346 high-quality soccer games. The database contains 702,096 bounding boxes, 37,709 essential event labels with time boundary and 17,115 highlight annotations for object detection, action recognition, temporal action localization, and highlight detection tasks.
A large-scale evaluation set that provides human ratings for the plausibility of 10,000 SP pairs over five SP relations, covering 2,500 most frequent verbs, nouns, and adjectives in American English.
Specially designed to evaluate active learning for video object detection in road scenes.
The Taskmaster-2 dataset consists of 17,289 dialogs in seven domains: restaurants (3276), food ordering (1050), movies (3047), hotels (2355), flights (2481), music (1602), and sports (3478).
A collection of 2511 recipes for zero-shot learning, recognition and anticipation.
Tencent ML-Images is a large open-source multi-label image database, including 17,609,752 training and 88,739 validation image URLs, which are annotated with up to 11,166 categories.
Twitter100k is a large-scale dataset for weakly supervised cross-media retrieval.
UIT-ViNewsQA is a new corpus for the Vietnamese language to evaluate healthcare reading comprehension models. The corpus comprises 22,057 human-generated question-answer pairs. Crowd-workers create the questions and their answers based on a collection of over 4,416 online Vietnamese healthcare news articles, where the answers comprise spans extracted from the corresponding articles.
A large-scale corpus for phonetic typology, with aligned segments and estimated phoneme-level labels in 690 readings spanning 635 languages, along with acoustic-phonetic measures of vowels and sibilants.
Embrapa Wine Grape Instance Segmentation Dataset (WGISD) contains grape clusters properly annotated in 300 images and a novel annotation methodology for segmentation of complex objects in natural images.
WWW Crowd provides 10,000 videos with over 8 million frames from 8,257 diverse scenes, therefore offering a comprehensive dataset for the area of crowd understanding.
The Sarcasm Corpus contains sarcastic and non-sarcastic utterances of three different types, which are balanced with half of the samples being sarcastic and half non-sarcastic. The three types are:
A multilingual image dataset with spatial relation annotations and object features for image-to-text generation, built using 2,026 images from the PASCAL VOC2008 dataset.
This dataset contains 114 individuals including 1824 images captured from two disjoint camera views. For each person, eight images are captured from eight different orientations under one camera view and are normalized to 128x48 pixels. This dataset is also split into two parts randomly. One contains 57 individuals for training, and the other contains 57 individuals for testing.
Blocksworld Image Reasoning Dataset (BIRD) contains images of wooden blocks in different configurations, and the sequence of moves to rearrange one configuration to the other.
A Large Dataset of Object Scans is a dataset of more than ten thousand 3D scans of real objects. To create the dataset, the authors recruited 70 operators, equipped them with consumer-grade mobile 3D scanning setups, and paid them to scan objects in their environments. The operators scanned objects of their choosing, outside the laboratory and without direct supervision by computer vision professionals. The result is a large and diverse collection of object scans: from shoes, mugs, and toys to grand pianos, construction vehicles, and large outdoor sculptures. The authors worked with an attorney to ensure that data acquisition did not violate privacy constraints. The acquired data was placed in the public domain and is available freely.
Multi-Sensor All Weather Mapping (MSAW) is a dataset and challenge, which features two collection modalities (both SAR and optical). The dataset and challenge focus on mapping and building footprint extraction using a combination of these data sources. MSAW covers 120 km^2 over multiple overlapping collects and is annotated with over 48,000 unique building footprints labels, enabling the creation and evaluation of mapping algorithms for multi-modal data.
StreetStyle is a large-scale dataset of photos of people annotated with clothing attributes, and use this dataset to train attribute classifiers via deep learning.
ELAS is a dataset for lane detection. It contains more than 20 different scenes (in more than 15,000 frames) and considers a variety of scenarios (urban road, highways, traffic, shadows, etc.). The dataset was manually annotated for several events that are of interest for the research community (i.e., lane estimation, change, and centering; road markings; intersections; LMTs; crosswalks and adjacent lanes).
The ObjectsRoom dataset is based on the MuJoCo environment used by the Generative Query Network 4 and is a multi-object extension of the 3d-shapes dataset. The training set contains 1M scenes with up to three objects. We also provide ~1K test examples for the following variants: