19,997 machine learning datasets
19,997 dataset results
Set12 is a collection of 12 grayscale images of different scenes that are widely used for evaluation of image denoising methods. The size of each image is 256×256.
A new dataset with abstractive dialogue summaries.
A large-scale dataset that consists of real photos captured from three different phones and one high-end reflex camera.
Functional Map of the World (fMoW) is a dataset that aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features.
MedMCQA is a large-scale, Multiple-Choice Question Answering (MCQA) dataset designed to address real-world medical entrance exam questions.
MPII Human Pose Dataset is a dataset for human pose estimation. It consists of around 25k images extracted from online videos. Each image contains one or more people, with over 40k people annotated in total. Among the 40k samples, ∼28k samples are for training and the remainder are for testing. Overall the dataset covers 410 human activities and each image is provided with an activity label. Images were extracted from a YouTube video and provided with preceding and following un-annotated frames.
NABirds V1 is a collection of 48,000 annotated photographs of the 400 species of birds that are commonly observed in North America. More than 100 photographs are available for each species, including separate annotations for males, females and juveniles that comprise 700 visual categories. This dataset is to be used for fine-grained visual categorization experiments.
The ShareGPT4V dataset is a pioneering large-scale resource that features 1.2 million highly descriptive captions. These captions surpass existing datasets in terms of diversity and information content. They cover a wide range of topics, including world knowledge, object properties, spatial relationships, and aesthetic evaluations². The dataset was introduced to address bottlenecks in large multi-modal models and enhance their performance by providing better captions².
The Pix3D dataset is a large-scale benchmark of diverse image-shape pairs with pixel-level 2D-3D alignment. Pix3D has wide applications in shape-related tasks including reconstruction, retrieval, viewpoint estimation, etc.
The Flickr30K Entities dataset is an extension to the Flickr30K dataset. It augments the original 158k captions with 244k coreference chains, linking mentions of the same entities across different captions for the same image, and associating them with 276k manually annotated bounding boxes. This is used to define a new benchmark for localization of textual entity mentions in an image.
Color BSD68 dataset for image denoising benchmarks is part of The Berkeley Segmentation Dataset and Benchmark. It is used for measuring image denoising algorithms performance. It contains 68 images.
The UCF-Crime dataset is a large-scale dataset of 128 hours of videos. It consists of 1900 long and untrimmed real-world surveillance videos, with 13 realistic anomalies including Abuse, Arrest, Arson, Assault, Road Accident, Burglary, Explosion, Fighting, Robbery, Shooting, Stealing, Shoplifting, and Vandalism. These anomalies are selected because they have a significant impact on public safety.
Aff-Wild2 is a large-scale in-the-wild database and an extension of the Aff-Wild dataset for affect recognition. It approximately doubles the number of included video frames and the number of subjects; thus, improving the variability of the included behaviors and of the involved persons. It is the only existing in-the-wild database with annotations for all 3 main behaviour tasks.
mC4 is a multilingual variant of the C4 dataset called mC4. mC4 comprises natural text in 101 languages drawn from the public Common Crawl web scrape.
The One Billion Word dataset is a dataset for language modeling. The training/held-out data was produced from the WMT 2011 News Crawl data using a combination of Bash shell and Perl scripts.
We introduce FLEURS, the Few-shot Learning Evaluation of Universal Representations of Speech benchmark. FLEURS is an n-way parallel speech dataset in 102 languages built on top of the machine translation FLoRes-101 benchmark, with approximately 12 hours of speech supervision per language. FLEURS can be used for a variety of speech tasks, including Automatic Speech Recognition (ASR), Speech Language Identification (Speech LangID), Translation and Retrieval. In this paper, we provide baselines for the tasks based on multilingual pre-trained models like mSLAM. The goal of FLEURS is to enable speech technology in more languages and catalyze research in low-resource speech understanding.
The SummEval dataset is a resource developed by the Yale LILY Lab and Salesforce Research for evaluating text summarization models. It was created as part of a project to address shortcomings in summarization evaluation methods.
Camouflaged Object (CAMO) dataset specifically designed for the task of camouflaged object segmentation. We focus on two categories, i.e., naturally camouflaged objects and artificially camouflaged objects, which usually correspond to animals and humans in the real world, respectively. Camouflaged object images consists of 1250 images (1000 images for the training set and 250 images for the testing set). Non-camouflaged object images are collected from the MS-COCO dataset (1000 images for the training set and 250 images for the testing set). CAMO has objectness mask ground-truth.
Multi30K is a large-scale multilingual multimodal dataset for interdisciplinary machine learning research. It extends the Flickr30K dataset with German translations created by professional translators over a subset of the English descriptions, and descriptions crowdsourced independently of the original English descriptions. The dataset was introduced to stimulate multilingual multimodal research.
e-SNLI is used for various goals, such as obtaining full sentence justifications of a model's decisions, improving universal sentence representations and transferring to out-of-domain NLI datasets.