Datasets

19,997 machine learning datasets

19,997 dataset results

Contour Drawing Dataset

A new dataset of contour drawings.

Cookie

The dataset is constructed from an Amazon review corpus by integrating both user-agent dialogue and custom knowledge graphs for recommendation.

2 papers0 benchmarks

CO-SKEL dataset

A benchmark dataset for the co-skeletonization task.

2 papers0 benchmarks

COVID19-Algeria-and-World-Dataset

A coronavirus dataset with 98 countries constructed from different reliable sources, where each row represents a country, and the columns represent geographic, climate, healthcare, economic, and demographic factors that may contribute to accelerate/slow the spread of the COVID-19. The assumptions for the different factors are as follows:

2 papers0 benchmarksTabular

COVID-19-CT-CXR

A public database of COVID-19 CXR and CT images, which are automatically extracted from COVID-19-relevant articles from the PubMed Central Open Access (PMC-OA) Subset.

2 papers0 benchmarks

Covid-HeRA

Covid-HeRA is a dataset for health risk assessment and severity-informed decision making in the presence of COVID19 misinformation. It is a benchmark dataset for risk-aware health misinformation detection, related to the 2019 coronavirus pandemic. Social media posts (Twitter) are annotated based on the perceived likelihood of health behavioural changes and the perceived corresponding risks from following unreliable advice found online.

2 papers0 benchmarksTexts

CQR (Contextual Query Rewrite)

CQR is an extension to the Stanford Dialogue Corpus. It contains crowd-sourced rewrites to facilitate research in dialogue state tracking using natural language as the interface.

2 papers0 benchmarksTexts

Creative Flow+ Dataset

Includes 3000 animated sequences rendered using styles randomly selected from 40 textured line styles and 38 shading styles, spanning the range between flat cartoon fill and wildly sketchy shading. The dataset includes 124K+ train set frames and 10K test set frames rendered at 1500x1500 resolution, far surpassing the largest available optical flow datasets in size.

2 papers0 benchmarks

Cumulo

A benchmark dataset for training and evaluating global cloud classification models. It consists of one year of 1km resolution MODIS hyperspectral imagery merged with pixel-width 'tracks' of CloudSat cloud labels.

2 papers0 benchmarks

Curiosity

The Curiosity dataset consists of 14K dialogs (with 181K utterances) with fine-grained knowledge groundings, dialog act annotations, and other auxiliary annotation. In this dataset users and virtual assistants converse about geographic topics like geopolitical entities and locations. This dataset is annotated with pre-existing user knowledge, message-level dialog acts, grounding to Wikipedia, and user reactions to messages.

2 papers0 benchmarksTexts

Czech restaurant information

Czech restaurant information is a dataset for NLG in task-oriented spoken dialogue systems with Czech as the target language. It originated as a translation of the English San Francisco Restaurants dataset by Wen et al. (2015).

2 papers1 benchmarksTexts

MVTec D2S (MVTec Densely Segmented Supermarket)

MVTec D2S is a benchmark for instance-aware semantic segmentation in an industrial domain. It contains 21,000 high-resolution images with pixel-wise labels of all object instances. The objects comprise groceries and everyday products from 60 categories. The benchmark is designed such that it resembles the real-world setting of an automatic checkout, inventory, or warehouse system. The training images only contain objects of a single class on a homogeneous background, while the validation and test sets are much more complex and diverse.

2 papers0 benchmarksImages

DAVANet

A large-scale multi-scene dataset for stereo deblurring, containing 20,637 blurry-sharp stereo image pairs from 135 diverse sequences and their corresponding bidirectional disparities.

2 papers0 benchmarks

DesireDB

Includes gold-standard labels for identifying statements of desire, textual evidence for desire fulfillment, and annotations for whether the stated desire is fulfilled given the evidence in the narrative context.

2 papers0 benchmarksTexts

DHP19 (Dynamic Vision Sensor 3D Human Pose Dataset)

DHP19 is the first human pose dataset with data collected from DVS event cameras.

2 papers16 benchmarks

DialogueFairness

The Dialogue Fairness dataset is used to evaluate and understand fairness in dialogue models, focusing on gender and racial biases.

2 papers0 benchmarksTexts

DIB-10K (DongNiao International Birds 10000)

Is a challenging image dataset which has more than 10 thousand different types of birds. It was created to enable the study of machine learning and also ornithology research.

2 papers2 benchmarks

DPC-Captions

This is an open-source image captions dataset for the aesthetic evaluation of images. The dataset is called DPC-Captions, which contains comments of up to five aesthetic attributes of one image through knowledge transfer from a full-annotated small-scale dataset.

2 papers0 benchmarksImages

DSBI (Double-Sided Braille Image)

The Double-Sided Braille Image dataset (DSBI) is a large-scale dataset for Braille image recognition. It has detailed Braille recto dots, verso dots and Braille cells annotation.

2 papers0 benchmarksImages

EGOK360

Contains annotations of human activity with different sub-actions, e.g., activity Ping-Pong with four sub-actions which are pickup-ball, hit, bounce-ball and serve.

2 papers0 benchmarksVideos

PreviousPage 300 of 1000Next