Datasets

19,997 machine learning datasets

19,997 dataset results

LasVR

A large-scale video database for rain removal (LasVR), which consists of 316 rain videos.

Live Comment Dataset

The Live Comment Dataset is a large-scale dataset with 2,361 videos and 895,929 live comments that were written while the videos were streamed.

2 papers0 benchmarksTexts

Consists of visual arithmetic problems automatically generated using a grammar model--And-Or Graph (AOG). These visual arithmetic problems are in the form of geometric figures: each problem has a set of geometric shapes as its context and embedded number symbols.

2 papers0 benchmarks

MAMe (Museum Art Medium dataset)

The MAMe dataset contains images of high-resolution and variable shape of artworks from 3 different museums:

2 papers1 benchmarksImages

Market1203-Reid-Dataset

This dataset contains 1203 individuals captured from two disjoint camera views. To each person, one to twelve images are captured from one to six different orientations under one camera view and are normalized to 128x64 pixels. This dataset is constructed based on the Market-1501 benchmark data and the orientation label for each image has been manually annotated.

2 papers0 benchmarksImages

MASRI-HEADSET

MASRI-HEADSET is a corpus that was developed by the MASRI project at the University of Malta. It consists of 8 hours of speech paired with text, recorded by using short text snippets in a laboratory environment. The speakers were recruited from different geographical locations all over the Maltese islands, and were roughly evenly distributed by gender.

2 papers0 benchmarksSpeech

MCAD (Multi-Camera Action Dataset)

Designed to evaluate the open view classification problem under the surveillance environment. In total, MCAD contains 14,298 action samples from 18 action categories, which are performed by 20 subjects and independently recorded with 5 cameras.

2 papers0 benchmarksVideos

MC-AFP

A dataset of around 2 million examples for machine reading-comprehension.

2 papers0 benchmarks

MD Gender (Multi-Dimensional Gender Bias Datasets)

Provides eight automatically annotated large scale datasets with gender information.

2 papers0 benchmarks

MilkQA

A question answering dataset from the dairy domain dedicated to the study of consumer questions. The dataset contains 2,657 pairs of questions and answers, written in the Portuguese language and originally collected by the Brazilian Agricultural Research Corporation (Embrapa). All questions were motivated by real situations and written by thousands of authors with very different backgrounds and levels of literacy, while answers were elaborated by specialists from Embrapa's customer service.

2 papers0 benchmarksTexts

MITOS_WSI_CMC

A dataset of 21 WSIs of CMC completely annotated for MF. For this, a pathologist screened all WSIs for potential MF and structures with a similar appearance.

2 papers0 benchmarks

MIZAN

Persian-English parallel corpus with more than one million sentence pairs collected from masterpieces of literature.

2 papers0 benchmarksTexts

MQR

A multi-domain question rewriting dataset is constructed from human contributed Stack Exchange question edit histories. The dataset contains 427,719 question pairs which come from 303 domains.

2 papers0 benchmarks

MSC

MSC is a dataset for Macro-Management in StarCraft 2 based on the platfrom SC2LE. It consists of well-designed feature vectors, pre-defined high-level actions and final result of each match. It contains 36,619 high quality replays, which are unbroken and played by relatively professional players.

2 papers0 benchmarksEnvironment

MultiReQA

MultiReQA is a cross-domain evaluation for retrieval question answering models. Retrieval question answering (ReQA) is the task of retrieving a sentence-level answer to a question from an open corpus. MultiReQA is a new multi-domain ReQA evaluation suite composed of eight retrieval QA tasks drawn from publicly available QA datasets from the MRQA shared task. MultiReQA contains the sentence boundary annotation from eight publicly available QA datasets including SearchQA, TriviaQA, HotpotQA, NaturalQuestions, SQuAD, BioASQ, RelationExtraction, and TextbookQA. Five of these datasets, including SearchQA, TriviaQA, HotpotQA, NaturalQuestions, SQuAD, contain both training and test data, and three, in cluding BioASQ, RelationExtraction, TextbookQA, contain only the test data.

2 papers0 benchmarksTexts

MUTLA

This dataset includes time-synchronized multimodal data records of students (learning logs, videos, EEG brainwaves) as they work in various subjects from Squirrel AI Learning System (SAIL) to solve problems of varying difficulty levels. The dataset resources include user records from the learner records store of SAIL, brainwave data collected by EEG headset devices, and video data captured by web cameras while students worked in the SAIL products.

2 papers0 benchmarks

Negotiation Dialogues Dataset

This dataset consists of 5808 dialogues, based on 2236 unique scenarios. Each dialogue is converted into two training examples in the dataset, showing the complete conversation from the perspective of each agent. The perspectives differ on their input goals, output choice, and in special tokens marking whether a statement was read or written.

2 papers0 benchmarks

PreviousPage 302 of 1000Next

Datasets

LasVR

Live Comment Dataset

Machine Number Sense

MAMe (Museum Art Medium dataset)

Market1203-Reid-Dataset

MASRI-HEADSET

MCAD (Multi-Camera Action Dataset)

MC-AFP

MD Gender (Multi-Dimensional Gender Bias Datasets)

MilkQA

MITOS_WSI_CMC

MIZAN

MQR

MSC

MultiReQA

MUTLA

Negotiation Dialogues Dataset

NewB

NewsPH-NLI

NLI-TR (Natural Language Inference in Turkish)

Datasets

LasVR

Live Comment Dataset

Machine Number Sense

MAMe (Museum Art Medium dataset)

Market1203-Reid-Dataset

MASRI-HEADSET

MCAD (Multi-Camera Action Dataset)

MC-AFP

MD Gender (Multi-Dimensional Gender Bias Datasets)

MilkQA

MITOS_WSI_CMC

MIZAN

MQR

MSC

MultiReQA

MUTLA

Negotiation Dialogues Dataset

NewB

NewsPH-NLI

NLI-TR (Natural Language Inference in Turkish)