Datasets

3,148 machine learning datasets

3,148 dataset results

Disfl-QA

Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 dataset, where each question in the dev set is annotated to add a contextual disfluency using the paragraph as a source of distractors.

3 papers0 benchmarksTexts

Itihasa

Itihasa is a large-scale corpus for Sanskrit to English translation containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata.

3 papers1 benchmarksTexts

Multilingual TOP

Multilingual TOP is a dataset for multilingual semantic parsing with human-written sentences as opposed to machine translated ones. The dataset sentences are in English, Italian and Japanese and it is based on the Facebook Task Oriented Parsing (TOP) dataset.

3 papers0 benchmarksTexts

EMOTyDA (Emotion aware Dialogue Act)

EMOTyDA is a multimodal Emotion aware Dialogue Act dataset collected from open-sourced dialogue datasets.

3 papers1 benchmarksTexts

HuRDL (Human-Robot Dialogue Learning Corpus)

The Human-Robot Dialogue Learning (HuRDL) Corpus is a dataset about asking questions in situated task-based interactions. It is a dialogue corpus collected in an online interactive virtual environment in which human participants play the role of a robot performing a collaborative tool-organization task.

3 papers0 benchmarksTexts

LARC (Language-annotated Abstraction and Reasoning)

LARC is a dataset built from ARC (Abstraction and Reasoning Corpus). ARC is a set of tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI.

3 papers0 benchmarksTexts

BCOPA-CE (A Balanced COPA Test Set with cause-effect as alternatives)

We provide the BCOPA-CE test set, which has balanced token distribution in the correct and wrong alternatives and increases the difficulty of being aware of cause and effect.

3 papers0 benchmarksTexts

EmailSum (Email Thread Summarization)

Email Thread Summarization (EmailSum) is a dataset which contains human-annotated short (<30 words) and long (<100 words) summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics. It was developed to spur research in thread summarization.

3 papers0 benchmarksTexts

Q-Pain

Q-Pain, a dataset for assessing bias in medical QA in the context of pain management, one of the most challenging forms of clinical decision-making.

3 papers0 benchmarksTexts

IPAC (Icelandic Parallel Abstracts Corpus)

IPAC (Icelandic Parallel Abstracts Corpus ) is a new Icelandic-English parallel corpus, composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.

3 papers0 benchmarksTexts

HatemojiCheck

HatemojiCheck is a test suite for detecting emoji-based hate of 3,930 test cases covering seven functionalities of emoji-based hate and six identities.

3 papers0 benchmarksTexts

WikiScenes

The WikiScenes dataset consists of paired images and language descriptions capturing world landmarks and cultural sites, with associated 3D models and camera poses. WikiScenes is derived from the massive public catalog of freely-licensed crowdsourced data in the Wikimedia Commons project, which contains a large variety of images with captions and other metadata.

3 papers0 benchmarks3D, Images, Texts

WDC-Dialogue

WDC-Dialogue is a dataset built from the Chinese social media to train EVA. Specifically, conversations from various sources are gathered and a rigorous data cleaning pipeline is designed to enforce the quality of WDC-Dialogue.

3 papers0 benchmarksTexts

RareDis corpus

The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases.

3 papers0 benchmarksTexts

HeadlineCause

HeadlineCause is a dataset for detecting implicit causal relations between pairs of news headlines. The dataset includes over 5000 headline pairs from English news and over 9000 headline pairs from Russian news labeled through crowdsourcing. The pairs vary from totally unrelated or belonging to the same general topic to the ones including causation and refutation relations.

3 papers0 benchmarksTexts

IfAct (Identifying Human Actions Visible in Online Vlogs)

We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present.

3 papers0 benchmarksTexts, Videos

BnB

BnB is a large-scale and diverse in-domain VLN (Vision and Language Navigation) dataset.

3 papers0 benchmarksImages, Texts

Commonsense-Dialogues

Commonsense-Dialogues is a crowdsourced dataset of ~11K dialogues grounded in social contexts involving utilization of commonsense. The social contexts used were sourced from the train split of the SocialIQA dataset, a multiple-choice question-answering based social commonsense reasoning benchmark.

3 papers0 benchmarksTexts

BenchIE

BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact.

3 papers3 benchmarksTexts

Konzil (Konzilsprotokolle_C)

Konzil dataset was created by specialists of the University of Greifswald. It contains manuscripts written in modern German. Train sample consists of 353 lines, validation - 29 lines and test - 87 lines.

3 papers0 benchmarksImages, Texts

PreviousPage 77 of 158Next