3,148 machine learning datasets
3,148 dataset results
Disfl-QA is a targeted dataset for contextual disfluencies in an information seeking setting, namely question answering over Wikipedia passages. Disfl-QA builds upon the SQuAD-v2 dataset, where each question in the dev set is annotated to add a contextual disfluency using the paragraph as a source of distractors.
Itihasa is a large-scale corpus for Sanskrit to English translation containing 93,000 pairs of Sanskrit shlokas and their English translations. The shlokas are extracted from two Indian epics viz., The Ramayana and The Mahabharata.
Multilingual TOP is a dataset for multilingual semantic parsing with human-written sentences as opposed to machine translated ones. The dataset sentences are in English, Italian and Japanese and it is based on the Facebook Task Oriented Parsing (TOP) dataset.
EMOTyDA is a multimodal Emotion aware Dialogue Act dataset collected from open-sourced dialogue datasets.
The Human-Robot Dialogue Learning (HuRDL) Corpus is a dataset about asking questions in situated task-based interactions. It is a dialogue corpus collected in an online interactive virtual environment in which human participants play the role of a robot performing a collaborative tool-organization task.
LARC is a dataset built from ARC (Abstraction and Reasoning Corpus). ARC is a set of tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI.
We provide the BCOPA-CE test set, which has balanced token distribution in the correct and wrong alternatives and increases the difficulty of being aware of cause and effect.
Email Thread Summarization (EmailSum) is a dataset which contains human-annotated short (<30 words) and long (<100 words) summaries of 2,549 email threads (each containing 3 to 10 emails) over a wide variety of topics. It was developed to spur research in thread summarization.
Q-Pain, a dataset for assessing bias in medical QA in the context of pain management, one of the most challenging forms of clinical decision-making.
IPAC (Icelandic Parallel Abstracts Corpus ) is a new Icelandic-English parallel corpus, composed of abstracts from student theses and dissertations. The texts were collected from the Skemman repository which keeps records of all theses, dissertations and final projects from students at Icelandic universities. The corpus was aligned based on sentence-level BLEU scores, in both translation directions, from NMT models using Bleualign. The result is a corpus of 64k sentence pairs from over 6 thousand parallel abstracts.
HatemojiCheck is a test suite for detecting emoji-based hate of 3,930 test cases covering seven functionalities of emoji-based hate and six identities.
The WikiScenes dataset consists of paired images and language descriptions capturing world landmarks and cultural sites, with associated 3D models and camera poses. WikiScenes is derived from the massive public catalog of freely-licensed crowdsourced data in the Wikimedia Commons project, which contains a large variety of images with captions and other metadata.
WDC-Dialogue is a dataset built from the Chinese social media to train EVA. Specifically, conversations from various sources are gathered and a rigorous data cleaning pipeline is designed to enforce the quality of WDC-Dialogue.
The RareDis corpus contains more than 5,000 rare diseases and almost 6,000 clinical manifestations are annotated. Moreover, the Inter Annotator Agreement evaluation shows a relatively high agreement (F1-measure equal to 83.5% under exact match criteria for the entities and equal to 81.3% for the relations). Based on these results, this corpus is of high quality, supposing a significant step for the field since there is a scarcity of available corpus annotated with rare diseases.
HeadlineCause is a dataset for detecting implicit causal relations between pairs of news headlines. The dataset includes over 5000 headline pairs from English news and over 9000 headline pairs from Russian news labeled through crowdsourcing. The pairs vary from totally unrelated or belonging to the same general topic to the ones including causation and refutation relations.
We consider the task of identifying human actions visible in online videos. We focus on the widely spread genre of lifestyle vlogs, which consist of videos of people performing actions while verbally describing them. Our goal is to identify if actions mentioned in the speech description of a video are visually present.
BnB is a large-scale and diverse in-domain VLN (Vision and Language Navigation) dataset.
Commonsense-Dialogues is a crowdsourced dataset of ~11K dialogues grounded in social contexts involving utilization of commonsense. The social contexts used were sourced from the train split of the SocialIQA dataset, a multiple-choice question-answering based social commonsense reasoning benchmark.
BenchIE: a benchmark and evaluation framework for comprehensive evaluation of OIE systems for English, Chinese and German. In contrast to existing OIE benchmarks, BenchIE takes into account informational equivalence of extractions: our gold standard consists of fact synsets, clusters in which we exhaustively list all surface forms of the same fact.
Konzil dataset was created by specialists of the University of Greifswald. It contains manuscripts written in modern German. Train sample consists of 353 lines, validation - 29 lines and test - 87 lines.