Datasets

3,148 machine learning datasets

3,148 dataset results

Open PI

Open PI is the first dataset for tracking state changes in procedural text from arbitrary domains by using an unrestricted (open) vocabulary. The dataset comprises 29,928 state changes over 4,050 sentences from 810 procedural real-world paragraphs from WikiHow.com. The state tracking task assumes new formulation in which just the text is provided, from which a set of state changes (entity, attribute, before, after) is generated for each step, where the entity, attribute, and values must all be predicted from an open vocabulary.

11 papers0 benchmarksTexts

VLEP (Video-and-Language Event Prediction)

VLEP contains 28,726 future event prediction examples (along with their rationales) from 10,234 diverse TV Show and YouTube Lifestyle Vlog video clips. Each example (see Figure 1) consists of a Premise Event (a short video clip with dialogue), a Premise Summary (a text summary of the premise event), and two potential natural language Future Events (along with Rationales) written by people. These clips are on average 6.1 seconds long and are harvested from diverse event-rich sources, i.e., TV show and YouTube Lifestyle Vlog videos.

11 papers1 benchmarksTexts, Videos

TaxiNLI

TaxiNLI is a dataset collected based on the principles and categorizations of the aforementioned taxonomy. A subset of examples are curated from MultiNLI (Williams et al., 2018) by sampling uniformly based on the entailment label and the domain. The dataset is annotated with finegrained category labels.

11 papers0 benchmarksTexts

ClariQ

ClariQ is an extension of the Qulac dataset with additional new topics, questions, and answers in the training set. The test set is completely unseen and newly collected. Like Qulac, ClariQ consists of single-turn conversations (initial_request, followed by clarifying question and answer). In addition, it comes with synthetic multi-turn conversations (up to three turns). ClariQ features approximately 18K single-turn conversations, as well as 1.8 million multi-turn conversations.

11 papers0 benchmarksTexts

methods2test

methods2test is a supervised dataset consisting of Test Cases and their corresponding Focal Methods from a set of Java software repositories. Methods2test was constructed by parsing the Java projects to obtain classes and methods with their associated metadata. Next each Test Class was matched to its corresponding Focal Class. Finally, each Test Case within a Test Class was mapped to the related Focal Method to obtain a set of Mapped Test Cases.

11 papers0 benchmarksTexts

ParCorFull (Parallel Corpus Annotated with Full Coreference)

ParCorFull is a parallel corpus annotated with full coreference chains that has been created to address an important problem that machine translation and other multilingual natural language processing (NLP) technologies face -- translation of coreference across languages. This corpus contains parallel texts for the language pair English-German, two major European languages. Despite being typologically very close, these languages still have systemic differences in the realisation of coreference, and thus pose problems for multilingual coreference resolution and machine translation. This parallel corpus covers the genres of planned speech (public lectures) and newswire. It is richly annotated for coreference in both languages, including annotation of both nominal coreference and reference to antecedents expressed as clauses, sentences and verb phrases.

11 papers0 benchmarksTexts

PhotoBook

A large-scale collection of visually-grounded, task-oriented dialogues in English designed to investigate shared dialogue history accumulating during conversation.

11 papers0 benchmarksImages, Texts

ProtoQA

ProtoQA is a question answering dataset for training and evaluating common sense reasoning capabilities of artificial intelligence systems in such prototypical situations. The training set is gathered from an existing set of questions played in a long-running international game show FAMILY- FEUD. The hidden evaluation set is created by gathering answers for each question from 100 crowd-workers.

11 papers0 benchmarksTexts

Talk the Walk

Talk The Walk is a large-scale dialogue dataset grounded in action and perception. The task involves two agents (a “guide” and a “tourist”) that communicate via natural language in order to achieve a common goal: having the tourist navigate to a given target location.

11 papers0 benchmarksImages, Texts

AKCES-GEC

AKCES-GEC is a new dataset on grammatical error correction for Czech.

11 papers0 benchmarksTexts

TRIPOD (TuRnIng POint Dataset)

TRIPOD contains screenplays and plot synopses with turning point (TP) annotations for 99 movies. Each movie contains:

11 papers0 benchmarksTexts, Videos

ReCAM (SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning)

Tasks Our shared task has three subtasks. Subtask 1 and 2 focus on evaluating machine learning models' performance with regard to two definitions of abstractness (Spreen and Schulz, 1966; Changizi, 2008), which we call imperceptibility and nonspecificity, respectively. Subtask 3 aims to provide some insights to their relationships.

11 papers1 benchmarksTexts

PATS (Pose Audio Transcript Style)

PATS dataset consists of a diverse and large amount of aligned pose, audio and transcripts. With this dataset, we hope to provide a benchmark that would help develop technologies for virtual agents which generate natural and relevant gestures.

11 papers0 benchmarksAudio, Texts, Videos

StylePTB

StylePTB is a fine-grained text style transfer benchmark. It consists of paired sentences undergoing 21 fine-grained stylistic changes spanning atomic lexical, syntactic, semantic, and thematic transfers of text, as well as compositions of multiple transfers which allow modelling of fine-grained stylistic changes as building blocks for more complex, high-level transfers.

11 papers0 benchmarksTexts

Com2Sense (Complementary Commonsense)

Complementary Commonsense (Com2Sense) is a dataset for benchmarking commonsense reasoning ability of NLP models. This dataset contains 4k statement true/false sentence pairs. The dataset is crowdsourced and enhanced with an adversarial model-in-the-loop setup to incentivize challenging samples. To facilitate a systematic analysis of commonsense capabilities, the dataset is designed along the dimensions of knowledge domains, reasoning scenarios and numeracy.

11 papers0 benchmarksTexts

TimeDial

TimeDial presents a crowdsourced English challenge set, for temporal commonsense reasoning, formulated as a multiple choice cloze task with around 1.5k carefully curated dialogs. The dataset is derived from the DailyDialog, which is a multi-turn dialog corpus.

11 papers0 benchmarksTexts

SEDE (Stack Exchange Data Explorer)

SEDE is a dataset comprised of 12,023 complex and diverse SQL queries and their natural language titles and descriptions, written by real users of the Stack Exchange Data Explorer out of a natural interaction. These pairs contain a variety of real-world challenges which were rarely reflected so far in any other semantic parsing dataset. The goal of this dataset is to take a significant step towards evaluation of Text-to-SQL models in a real-world setting. Compared to other Text-to-SQL datasets, SEDE contains at least 10 times more SQL queries templates (queries after canonization and anonymization of values) than other datasets, and has the most diverse set of utterances and SQL queries (in terms of 3-grams) out of all single-domain datasets. SEDE introduces real-world challenges, such as under-specification, usage of parameters in queries, dates manipulation and more.

11 papers4 benchmarksTexts

AIT-QA (Airline Industry Table QA)

AIT-QA is a dataset for Table Question Answering (Table-QA) which is specific to the airline industry. The dataset consists of 515 questions authored by human annotators on 116 tables extracted from public U.S. SEC filings of major airline companies for the fiscal years 2017-2019. It also contains annotations pertaining to the nature of questions, marking those that require hierarchical headers, domain-specific terminology, and paraphrased forms.

11 papers0 benchmarksTexts

MultiEURLEX

MultiEURLEX is a multilingual dataset for topic classification of legal documents. The dataset comprises 65k European Union (EU) laws, officially translated in 23 languages, annotated with multiple labels from the EUROVOC taxonomy. The dataset covers 23 official EU languages from 7 language families.

11 papers0 benchmarksTexts

BCNB (Early Breast Cancer Core-Needle Biopsy WSI)

Breast cancer (BC) has become the greatest threat to women’s health worldwide. Clinically, identification of axillary lymph node (ALN) metastasis and other tumor clinical characteristics such as ER, PR, and so on, are important for evaluating the prognosis and guiding the treatment for BC patients.

11 papers0 benchmarksImages, Texts

PreviousPage 43 of 158Next