Datasets

3,148 machine learning datasets

3,148 dataset results

TURL (Twitter News URL Corpus)

Twitter News URL Corpus is a human-labeled paraphrase corpus to date of 51,524 sentence pairs and the first cross-domain benchmarking for automatic paraphrase identification.

6 papers2 benchmarksTexts

V2C (Video-to-Commonsense)

Contains ~9K videos of human agents performing various actions, annotated with 3 types of commonsense descriptions.

6 papers0 benchmarksTexts, Videos

WNLaMPro (WordNet Language Model Probing)

The WordNet Language Model Probing (WNLaMPro) dataset consists of relations between keywords and words. It contains 4 different kinds of relations: Antonym, Hypernym, Cohyponym and Corruption.

6 papers0 benchmarksTexts

JParaCrawl is a parallel corpus for English-Japanese, for which the amount of publicly available parallel corpora is still limited. The parallel corpus was constructed by broadly crawling the web and automatically aligning parallel sentences. The corpus amassed over 8.7 million sentence pairs.

6 papers0 benchmarksTexts

French Wikipedia

French Wikipedia is a dataset used for pretraining the CamemBERT French language model. It uses the official 2019 French Wikipedia dumps

6 papers0 benchmarksTexts

IBM-Rank-30k (IBM-ArgQ-Rank-30kArgs)

The IBM-Rank-30k is a dataset for the task of argument quality ranking. It is a corpus of 30,497 arguments carefully annotated for point-wise quality.

6 papers0 benchmarksTexts

TREC-10 (TREC-10 Question Classification)

A question type classification dataset with 6 classes for questions about a person, location, numeric information, etc. The test split has 500 questions, and the training split has 5452 questions.

6 papers2 benchmarksTexts

DogWhistle

Cant (also known as doublespeak, cryptolect, argot, anti-language or secret language) is important for understanding advertising, comedies and dog-whistle politics. DogWhistle is a large and diverse Chinese dataset for creating and understanding cant from a computational linguistics perspective.

6 papers0 benchmarksTexts

IIIT-ILST

IIIT-ILST is a dataset and benchmark for scene text recognition for three Indic scripts - Devanagari, Telugu and Malayalam. IIIT-ILST contains nearly 1000 real images per each script which are annotated for scene text bounding boxes and transcriptions.

6 papers0 benchmarksImages, Texts

NLmaps

There are two versions of the NLmaps corpus. NLmaps (v1) and its extension NLmaps v2. Both versions of the NLmaps corpus consist of questions about geographical facts that can be answered with the OpenStreetMap (OSM) database (available under the Open Database Licence). The questions are in English and have a corresponding Machine Readable Language (MRL) parse. Gold answers can be obtained by executing the gold parses against the OSM database using the NLmaps backend, which is based on the Overpass-API (available under the Affero GPL v3).

6 papers0 benchmarksTexts

QA-SRL Bank 2.0

QA-SRL Bank 2.0 is a large-scale corpus of Question-Answer driven Semantic Role Labeling (QA-SRL) annotations. The corpus consists of over 250,000 question-answer pairs for over 64,000 sentences across 3 domains and was gathered with a new crowd-sourcing scheme that was shown to have high precision and good recall at modest cost.

6 papers0 benchmarksTexts

XL-BEL

XL-BEL is a benchmark for cross-lingual biomedical entity linking (XL-BEL). The benchmark spans 10 typologically diverse languages.

6 papers0 benchmarksTexts

OTTers

OTTers is a dataset of human one-turn topic transitions. In this task, models must connect two topics in a cooperative and coherent manner, by generating a "bridging" utterance connecting the new topic tot he topic of the previous conversation turn.

6 papers0 benchmarksTexts

ConvoSumm

ConvoSumm is a suite of four datasets to evaluate a model’s performance on a broad spectrum of conversation data.

6 papers0 benchmarksTexts

Swords (Stanford Word Substitution benchmark)

Swords (Standford Word Substitution) is a benchmark for lexical substitution, the task of finding appropriate substitutes for a target word in a context. Swords is composed of context, target word, and substitute triples (c, w, w'), each of which has a score that indicates the appropriateness of the substitute.

6 papers0 benchmarksTexts

JerichoWorld

JerichoWorld is a dataset that enables the creation of learning agents that can build knowledge graph-based world models of interactive narratives. Interactive narratives -- or text-adventure games -- are partially observable environments structured as long puzzles or quests in which an agent perceives and interacts with the world purely through textual natural language. Each individual game typically contains hundreds of locations, characters, and objects -- each with their own unique descriptions -- providing an opportunity to study the problem of giving language-based agents the structured memory necessary to operate in such worlds.

6 papers5 benchmarksTexts

MuSeRC (Russian Multi-Sentence Reading Comprehension)

We present a reading comprehension challenge in which questions can only be answered by taking into account information from multiple sentences. The dataset is the first to study multi-sentence inference at scale, with an open-ended set of question types that requires reasoning skills.

6 papers2 benchmarksTexts

RCB (Russian Commitment Bank)

The Russian Commitment Bank is a corpus of naturally occurring discourses whose final sentence contains a clause-embedding predicate under an entailment cancelling operator (question, modal, negation, antecedent of conditional).

6 papers2 benchmarksTexts

ZS-F-VQA

The ZS-F-VQA dataset is a new split of the F-VQA dataset for zero-shot problem. Firstly we obtain the original train/test split of F-VQA dataset and combine them together to filter out the triples whose answers appear in top-500 according to its occurrence frequency. Next, we randomly divide this set of answers into new training split (a.k.a. seen) $\mathcal{A}_s$ and testing split (a.k.a. unseen) $\mathcal{A}_u$ at the ratio of 1:1. With reference to F-VQA standard dataset, the division process is repeated 5 times. For each $(i,q,a)$ triplet in original F-VQA dataset, it is divided into training set if $a \in \mathcal{A}_s$. Else it is divided into testing set. The overlap of answer instance between training and testing set in F-VQA are $2565$ compared to $0$ in ZS-F-VQA.

6 papers1 benchmarksGraphs, Images, Texts

BMELD

BMELD is a bilingual (English-Chinese) dialogue corpus for Neural chat translation.

6 papers0 benchmarksTexts

PreviousPage 57 of 158Next