Datasets

19,997 machine learning datasets

19,997 dataset results

Synthinel-1

Synthinel-1 is a collection of synthetic overhead imagery with full pixel-wise building segmentation labels.

TArC

A morpho-syntactically annotated Tunisian Arabish Corpus (TArC).

The TCG dataset is used to evaluate Traffic Control Gesture recognition for autonomous driving. The dataset is based on 3D body skeleton input to perform traffic control gesture classification on every time step. The dataset consists of 250 sequences from several actors, ranging from 16 to 90 seconds per sequence.

4 papers0 benchmarksImages

VLOG Dataset

A large collection of interaction-rich video data which are annotated and analyzed.

4 papers0 benchmarks

TinyVIRAT

TinyVIRAT contains natural low-resolution activities. The actions in TinyVIRAT videos have multiple labels and they are extracted from surveillance videos which makes them realistic and more challenging.

4 papers0 benchmarksVideos

Tour20

Contains 140 videos with multiple human created summaries, which were acquired in a controlled experiment.

4 papers0 benchmarksVideos

CVL Traffic Signs Dataset

A video dataset for recognising traffic signs hosted with the first IEEE Video and Image Processing (VIP) Cup within the IEEE Signal Processing Society.

4 papers0 benchmarks

Turing Change Point Dataset

Specifically designed for the evaluation of change point detection algorithms, consisting of 37 time series from various domains.

4 papers0 benchmarks

UniMiB SHAR

Includes 11,771 samples of both human activities and falls performed by 30 subjects of ages ranging from 18 to 60 years. Samples are divided in 17 fine grained classes grouped in two coarse grained classes: one containing samples of 9 types of activities of daily living (ADL) and the other containing samples of 8 types of falls. The dataset has been stored to include all the information useful to select samples according to different criteria, such as the type of ADL, the age, the gender, and so on.

4 papers0 benchmarks

UTA-RLDD (University of Texas at Arlington Real-Life Drowsiness Dataset)

Consists of around 30 hours of video, with contents ranging from subtle signs of drowsiness to more obvious ones.

4 papers0 benchmarksImages

VIENA2

Covers 5 generic driving scenarios, with a total of 25 distinct action classes. It contains more than 15K full HD, 5s long videos acquired in various driving conditions, weathers, daytimes and environments, complemented with a common and realistic set of sensor measurements. This amounts to more than 2.25M frames, each annotated with an action label, corresponding to 600 samples per action class.

4 papers0 benchmarksImages

ViMMRC (Vietnamese Multiple-choice Machine Reading Comprehension Corpus)

A challenging machine comprehension corpus with multiple-choice questions, intended for research on the machine comprehension of Vietnamese text. This corpus includes 2,783 multiple-choice questions and answers based on a set of 417 Vietnamese texts used for teaching reading comprehension for 1st to 5th graders. Answers may be extracted from the contents of single or multiple sentences in the corresponding reading text.

4 papers0 benchmarksTexts

WikiAsp

A large-scale dataset for multi-domain aspect-based summarization that attempts to spur research in the direction of open-domain aspect-based summarization.

4 papers0 benchmarks

WikiSem500

The WikiSem500 dataset contains around 500 per-language cluster groups for English, Spanish, German, Chinese, and Japanese (a total of 13,314 test cases).

4 papers0 benchmarksTexts

WISDOM (Warehouse Instance Segmentation Dataset for Object Manipulation)

Synthetic training dataset of 50,000 depth images and 320,000 object masks using simulated heaps of 3D CAD models.

4 papers1 benchmarks

XED

XED is a multilingual fine-grained emotion dataset. The dataset consists of human-annotated Finnish (25k) and English sentences (30k), as well as projected annotations for 30 additional languages, providing new resources for many low-resource languages.

4 papers0 benchmarks

XOR-TYDI QA

A large-scale dataset built on questions from TyDi QA lacking same-language answers.

4 papers0 benchmarksTexts

X-SRL

SRL is the task of extracting semantic predicate-argument structures from sentences. X-SRL is a multilingual parallel Semantic Role Labelling (SRL) corpus for English (EN), German (DE), French (FR) and Spanish (ES) that is based on English gold annotations and shares the same labelling scheme across languages.

4 papers0 benchmarksTexts

Verse

Verse is a new dataset that augments existing multimodal datasets (COCO and TUHOI) with sense labels.

4 papers0 benchmarksImages

RDD-2020 (Road Damage Dataset 2020)

The Road Damage Dataset 2020 (RDD-2020) Secondly is a large-scale heterogeneous dataset comprising 26620 images collected from multiple countries using smartphones. The images are collected from roads in India, Japan and the Czech Republic.

4 papers0 benchmarksImages

PreviousPage 235 of 1000Next

Datasets

Synthinel-1

TArC

TCG (Traffic Control Gesture)

VLOG Dataset

TinyVIRAT

Tour20

CVL Traffic Signs Dataset

Turing Change Point Dataset

UniMiB SHAR

UTA-RLDD (University of Texas at Arlington Real-Life Drowsiness Dataset)

VIENA2

ViMMRC (Vietnamese Multiple-choice Machine Reading Comprehension Corpus)

WikiAsp

WikiSem500

WISDOM (Warehouse Instance Segmentation Dataset for Object Manipulation)

XED

XOR-TYDI QA

X-SRL

Verse

RDD-2020 (Road Damage Dataset 2020)

Datasets

Synthinel-1

TArC

TCG (Traffic Control Gesture)

VLOG Dataset

TinyVIRAT

Tour20

CVL Traffic Signs Dataset

Turing Change Point Dataset

UniMiB SHAR

UTA-RLDD (University of Texas at Arlington Real-Life Drowsiness Dataset)

VIENA2

ViMMRC (Vietnamese Multiple-choice Machine Reading Comprehension Corpus)

WikiAsp

WikiSem500

WISDOM (Warehouse Instance Segmentation Dataset for Object Manipulation)

XED

XOR-TYDI QA

X-SRL

Verse

RDD-2020 (Road Damage Dataset 2020)