Datasets

19,997 machine learning datasets

19,997 dataset results

CapGaze

Consists of eye movements and verbal descriptions recorded synchronously over images.

Consists of 10K sentence pairs which are human-annotated for semantic relatedness and entailment. The dataset may be used for the evaluation of compositional distributional semantics models of Polish.

1 papers0 benchmarks

CECW (Colorful Extended Cleanup World)

The CECW dataset is a color-extended version of the Cleanup World (CW) borrowed from the mobile-manipulation robot domain. CW refers to a world equipped with a movable object as well as four rooms in four colors, including "blue," "green," "red," and "yellow," which is designed as a simulation environment where the agent can act based on the instructions received. CW obeys a particular Geometric Linear Temporal Logic (GLTL) to parse commands by grammatical syntax, resulting in a total of 3,382 commands reflecting 39 GLTL expressions.

1 papers0 benchmarksTexts

Cervix93 Cytology Dataset

The dataset has 93 image stacks and their corresponding Extended Depth of Field (EDF) image acquired from cases with grades Nagative, LSIL or HSIL (The Bethesda System): - Negative: 16 - LSIL: 46 - HSIL: 31 The ground truth includes the grade labels for each frame and manually marked points inside cervical cells in each frame. There are in total 2705 manually marked points inside all frames: - Negative: 238 - LSIL: 1536 - HSIL: 931

1 papers0 benchmarksImages, Medical

Chinese AI and Law (CAIL) 2018

Large-scale Chinese legal dataset for judgment prediction. \dataset contains more than 2.6 million criminal cases published by the Supreme People's Court of China, which are several times larger than other datasets in existing works on judgment prediction.

1 papers0 benchmarksTexts

Chinese Literature NER RE

Chinese Literature NER RE is a Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text. It is constructed from hundreds of Chinese literature articles.

1 papers0 benchmarksTexts

Chinese Traditional Painting dataset

The Chinese Traditional Painting dataset for style transfer contains 1000 content images and 100 style images. The content images are mostly the photorealistic scenes of mountain, lake, river, bridge, and buildings in regions south of the Yangtze River. It includes not only the scenes of China, but also beautiful pictures of Rhine, Alps, Yellow Stone, Grand Canyon, etc. The content images include diverse types of Chinese traditional paintings.

1 papers0 benchmarksImages

CITR Dataset

CITR Dataset consists of experimentally designed fundamental VCI scenarios (front, back, and lateral VCIs) and provides unique ID for each pedestrian, which is suitable for exploring a specific aspect of VCI. DUT dataset gives two ordinary and natural VCI scenarios in crowded university campus, which can be used for more general purpose VCI exploration.

1 papers0 benchmarks

CLAD (Complex and Long Activities Dataset)

CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.

1 papers0 benchmarksPoint cloud, RGB-D, Videos

CloudCast (CloudCast: A Satellite-Based Dataset and Baseline for Forecasting Clouds)

A satellite-based dataset called "CloudCast". It consists of 70080 images with 10 different cloud types for multiple layers of the atmosphere annotated on a pixel level. The spatial resolution of the dataset is 928 × 1530 pixels (3 × 3 km per pixel) with 15-min intervals between frames for the period January 1, 2017, to December 31, 2018. All frames are centered and projected over Europe.

1 papers0 benchmarks

CMCNC (Coherent Multiple Choice Narrative Cloze)

The Coherent Multiple Choice Narrative Cloze (CMCNC) dataset is an evaluation dataset for the multi-choice narrative cloze task, where the goal is to distinguish which event has been held out from a document from a small set of randomly drawn events.

1 papers0 benchmarks

CodeSwitch-Reddit

A diverse dataset of written code-switched productions, curated from topical threads of multiple bilingual communities on the Reddit discussion platform, and explore questions that were mainly addressed in the context of spoken language thus far.

1 papers0 benchmarksTexts

Composed Quora

The Composed Quora dataset consists of questions extracted from Quora that are grouped together if they are asking the same thing. The dataset contains 60,400 groups of questions, each group with at least 3 questions that are asking the same.

1 papers0 benchmarksTexts

Controversial News Topic Datasets

Corpus of controversial news articles extracted from Twitter. Contains news from three different topics: Beef Ban – controversy over the slaughter and sale of beef on religious grounds (1543 articles) is localised to a particular region, mainly Indian subcontinent, while Gun Control – restrictions on carrying, using, or purchasing firearms (6494 articles) and Capital Punishment – use of the death penalty (7905 articles) are topical in various regions around the world.

1 papers0 benchmarksTexts

COQE (Containers Of liQuid contEnt)

Contains more than 5,000 images of 10,000 liquid containers in context labelled with volume, amount of content, bounding box annotation, and corresponding similar 3D CAD models.

1 papers0 benchmarksImages

COSTRA 1.0

COSTRA 1.0 is a dataset of complex sentence transformations. The dataset is intended for the study of sentence-level embeddings beyond simple word alternations or standard paraphrasing. The first version of the dataset is limited to sentences in Czech but the construction method is universal and the authors plan to use it also for other languages. The dataset consist of 4,262 unique sentences with average length of 10 words, illustrating 15 types of modifications such as simplification, generalization, or formal and informal language variation.

1 papers0 benchmarksTexts

COVID19-CountryImage

The Covid19-CountryImage dataset is a Twitter dataset which contains COVID-19-related tweets.

1 papers0 benchmarksTexts

CPCXR (COVID-19 Posteroanterior Chest X-Ray fused)

The COVID-19 Posteroanterior Chest X-Ray fused (CPCXR) dataset is generated by the fusion of three publicly available datasets: COVID-19 cxr image, Radiological Society of North America (RSNA), and U.S. national library of medicine (USNLM) collected Montgomery country - NLM(MC). The dataset consists of samples of diseases labeled as COVID-19, Tuberculosis, Other pneumonia (SARS, MERS, etc.), and Normal. The dataset can be utilized to train an evaulate deep learning and machine learning models as binary and multi-class classification problem.

1 papers0 benchmarksImages, Medical

CPH

A large-scale database including substantial CU partition data for HEVC intra- and inter-modes. This enables deep learning on the CU partition.

1 papers0 benchmarks

CRL-Person

Provides two large-scale multi-step benchmarks for biometric identification, where the visual appearance of different classes are highly relevant.

1 papers0 benchmarks

PreviousPage 367 of 1000Next

Datasets

CapGaze

Polish CDSCorpus

CECW (Colorful Extended Cleanup World)

Cervix93 Cytology Dataset

Chinese AI and Law (CAIL) 2018

Chinese Literature NER RE

Chinese Traditional Painting dataset

CITR Dataset

CLAD (Complex and Long Activities Dataset)

CloudCast (CloudCast: A Satellite-Based Dataset and Baseline for Forecasting Clouds)

CMCNC (Coherent Multiple Choice Narrative Cloze)

CodeSwitch-Reddit

Composed Quora

Controversial News Topic Datasets

COQE (Containers Of liQuid contEnt)

COSTRA 1.0

COVID19-CountryImage

CPCXR (COVID-19 Posteroanterior Chest X-Ray fused)

CPH

CRL-Person

Datasets

CapGaze

Polish CDSCorpus

CECW (Colorful Extended Cleanup World)

Cervix93 Cytology Dataset

Chinese AI and Law (CAIL) 2018

Chinese Literature NER RE

Chinese Traditional Painting dataset

CITR Dataset

CLAD (Complex and Long Activities Dataset)

CloudCast (CloudCast: A Satellite-Based Dataset and Baseline for Forecasting Clouds)

CMCNC (Coherent Multiple Choice Narrative Cloze)

CodeSwitch-Reddit

Composed Quora

Controversial News Topic Datasets

COQE (Containers Of liQuid contEnt)

COSTRA 1.0

COVID19-CountryImage

CPCXR (COVID-19 Posteroanterior Chest X-Ray fused)

CPH

CRL-Person