19,997 machine learning datasets
19,997 dataset results
A dense crowd dataset with manually annotated groundtruth, collected from different public datasets. This dataset comprises 20 videos that exhibit a multitude of motion behaviors that cover both the obvious and subtle instabilities.
Contributes dataset: (1) reviewing the dynamics behind saliency and crowds. (2) using eye tracking to create a dynamic human eye fixation dataset over a new set of crowd videos gathered from the Internet. The videos are annotated into three distinct density levels.
CUHK-QA is a dataset for natural language-based person search using iterative questioning.
The Curated AFD dataset is a curated version of the Asian Face Dataset (AFD) for face recognition research. The original AFD dataset has been curated to remove wrong identity labels, duplicate images and duplicate subjects.
The Curation Corpus is a collection of 40,000 professionally-written summaries of news articles, with links to the articles themselves.
Czech-English parallel corpus CzEng 2.0 consisting of over 2 billion words (2 "gigawords") in each language. The corpus contains document-level information and is filtered with several techniques to lower the amount of noise.
A large-scale comprehensive collection of dashcam videos collected by vehicles on DiDi's platform. D2-City contains more than 10000 video clips which deeply reflect the diversity and complexity of real-world traffic scenarios in China.
A large benchmark dataset containing 50K human judgments for 5K distinct sentence pairs in the English dative alternation. This dataset includes 200 unique verbs and systematically varies the definiteness and length of arguments.
A line drawing restoration dataset which consists of 71 line drawing sketches by Leonardo Da Vinci.
The dataset provides the content of all articles for 128 Wikipedia languages. The dataset has been further enriched with about 25% more links and selected partitions published as Linked Data.
A dataset of stance-labeled GW sentences.
DET is a lane detection dataset that consists of the raw event data, accumulated images over 30ms and corresponding lane labels. Contains 17,103 lane instances, each of which is labeled pixel by pixel manually.
Contains Diabetic Foot Ulcers (DFU) from different patients.
Extracts diseases and syndromes (DsSs) from more than 65,000 neurology case reports from 66 journals in PubMed over the last six decades from 1955 to 2017.
DLBCL-Morph is a dataset containing 42 digitally scanned high-resolution tissue microarray (TMA) slides accompanied by clinical, cytogenetic, and geometric features from 209 DLBCL cases.
DpgMedia2019 is a Dutch news dataset for partisanship detection. It contains more than 100K articles that are labelled on the publisher level and 776 articles that were crowdsourced using an internal survey platform and labelled on the article level.
This dataset contains videos where a flying drone (hexacopter) is captured with multiple consumer-grade cameras (smartphones, compact cameras, gopro,...) with highly accurate 3D drone trajectory ground truth recorderd by a precise real-time RTK system from Fixposition. In some videos, the ground truth temporal synchronization and ground truth camera locations are also provided.
Edge-Map-345C is a large-scale edge-map dataset including 290,281 edge-maps corresponding to 345 object categories of QuickDraw dataset. In particular, these 345 categories are corresponding to the 345 free-hand sketch categories of Google QuickDraw dataset.
Edina-DR is a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse.
Contains 350 tweets with more than 8,000 words including 3,000 unique words written in Egyptian dialect. The tweets have much dialectal content covering most of dialectal Egyptian phonological, morphological, and syntactic phenomena. It also includes Twitter-specific aspects of the text, such as #hashtags, @mentions, emoticons and URLs.