Datasets

3,148 machine learning datasets

3,148 dataset results

PrOntoQA (Proof and Ontology-Generated Question-Answering)

PrOntoQA is a question-answering dataset which generates examples with chains-of-thought that describe the reasoning required to answer the questions correctly. The sentences in the examples are syntactically simple and amenable to semantic parsing. It can be used to formally analyze the predicted chain-of-thought from large language models such as GPT-3.

55 papers0 benchmarksTexts

PMC-VQA

PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.

55 papers3 benchmarksImages, Medical, Texts

DAQUAR

DAQUAR (DAtaset for QUestion Answering on Real-world images) is a dataset of human question answer pairs about images.

54 papers0 benchmarksImages, Texts

CrossTask

CrossTask dataset contains instructional videos, collected for 83 different tasks. For each task an ordered list of steps with manual descriptions is provided. The dataset is divided in two parts: 18 primary and 65 related tasks. Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries. Videos for the related tasks are collected automatically and don't have annotations.

54 papers4 benchmarksTexts, Videos

ASSET

ASSET is a new dataset for assessing sentence simplification in English. ASSET is a crowdsourced multi-reference corpus where each simplification was produced by executing several rewriting transformations.

54 papers5 benchmarksTexts

WikiSum

WikiSum is a dataset based on English Wikipedia and suitable for a task of multi-document abstractive summarization. In each instance, the input is comprised of a Wikipedia topic (title of article) and a collection of non-Wikipedia reference documents, and the target is the Wikipedia article text. The dataset is restricted to the articles with at least one crawlable citation. The official split divides the articles roughly into 80/10/10 for train/development/test subsets, resulting in 1865750, 233252, and 232998 examples respectively.

54 papers0 benchmarksTexts

DDI

The DDIExtraction 2013 task relies on the DDI corpus which contains MedLine abstracts on drug-drug interactions as well as documents describing drug-drug interactions from the DrugBank database.

54 papers2 benchmarksTexts

C3

C3 is a free-form multiple-Choice Chinese machine reading Comprehension dataset.

54 papers0 benchmarksTexts

RxR (Room-across-Room)

Room-Across-Room (RxR) is a multilingual dataset for Vision-and-Language Navigation (VLN) for Matterport3D environments. In contrast to related datasets such as Room-to-Room (R2R), RxR is 10x larger, multilingual (English, Hindi and Telugu), with longer and more variable paths, and it includes and fine-grained visual groundings that relate each word to pixels/surfaces in the environment.

54 papers1 benchmarksTexts, Videos

Multi-Domain Sentiment

The Multi-Domain Sentiment Dataset contains product reviews taken from Amazon.com from many product types (domains). Some domains (books and dvds) have hundreds of thousands of reviews. Others (musical instruments) have only a few hundred. Reviews contain star ratings (1 to 5 stars) that can be converted into binary labels if needed.

54 papers0 benchmarksTexts

VoiceBank + DEMAND (Noisy speech database for training speech enhancement algorithms and TTS models)

VoiceBank+DEMAND is a noisy speech database for training speech enhancement algorithms and TTS models. The database was designed to train and test speech enhancement methods that operate at 48kHz. A more detailed description can be found in the paper associated with the database. Some of the noises were obtained from the Demand database, available here: http://parole.loria.fr/DEMAND/ . The speech database was obtained from the Voice Banking Corpus, available here: http://homepages.inf.ed.ac.uk/jyamagis/release/VCTK-Corpus.tar.gz .

53 papers9 benchmarksAudio, Texts

MLDoc (Multilingual Document Classification Corpus)

Multilingual Document Classification Corpus (MLDoc) is a cross-lingual document classification dataset covering English, German, French, Spanish, Italian, Russian, Japanese and Chinese. It is a subset of the Reuters Corpus Volume 2 selected according to the following design choices:

53 papers0 benchmarksTexts

DRCD (Delta Reading Comprehension Dataset)

Delta Reading Comprehension Dataset (DRCD) is an open domain traditional Chinese machine reading comprehension (MRC) dataset. This dataset aimed to be a standard Chinese machine reading comprehension dataset, which can be a source dataset in transfer learning. The dataset contains 10,014 paragraphs from 2,108 Wikipedia articles and 30,000+ questions generated by annotators.

53 papers0 benchmarksTexts

PAQ (Probably Asked Questions)

Probably Asked Questions (PAQ) is a very large resource of 65M automatically-generated QA-pairs. PAQ is a semi-structured Knowledge Base (KB) of 65M natural language QA-pairs, which models can memorise and/or learn to retrieve from. PAQ differs from traditional KBs in that questions and answers are stored in natural language, and that questions are generated such that they are likely to appear in ODQA datasets. PAQ is automatically constructed using a question generation model and Wikipedia.

53 papers0 benchmarksTexts

Bio (Bio AMR Corpus)

This corpus includes annotations of cancer-related PubMed articles, covering 3 full papers (PMID:24651010, PMID:11777939, PMID:15630473) as well as the result sections of 46 additional PubMed papers. The corpus also includes about 1000 sentences each from the BEL BioCreative training corpus and the Chicago Corpus.

53 papers2 benchmarksGraphs, Texts

MMVP

The MMVP (Multimodal Visual Patterns) Benchmark focuses on identifying "CLIP-blind pairs" – images that appear similar to the CLIP model despite having clear visual differences. These patterns highlight the challenges these systems face in answering straightforward questions, often leading to incorrect responses and hallucinated explanations.

53 papers0 benchmarksImages, Texts

HallusionBench

Large language models (LLMs), after being aligned with vision models and integrated into vision-language models (VLMs), can bring impressive improvement in image reasoning tasks. This was shown by the recently released GPT-4V(ison), LLaVA-1.5, etc. However, the strong language prior in these SOTA LVLMs can be a double-edged sword: they may ignore the image context and solely rely on the (even contradictory) language prior for reasoning. In contrast, the vision modules in VLMs are weaker than LLMs and may result in misleading visual representations, which are then translated to confident mistakes by LLMs.

53 papers2 benchmarksImages, Texts, Videos

Weibo NER

The Weibo NER dataset is a Chinese Named Entity Recognition dataset drawn from the social media website Sina Weibo.

52 papers7 benchmarksTexts

MTNT

The Machine Translation of Noisy Text (MTNT) dataset is a Machine Translation dataset that consists of noisy comments on Reddit and professionally sourced translation. The translation are between French, Japanese and French, with between 7k and 37k sentence per language pair.

52 papers0 benchmarksTexts

WikiLingua

WikiLingua includes ~770k article and summary pairs in 18 languages from WikiHow. Gold-standard article-summary alignments across languages are extracted by aligning the images that are used to describe each how-to step in an article.

52 papers0 benchmarksTexts

PreviousPage 18 of 158Next