Datasets

3,148 machine learning datasets

3,148 dataset results

COCO-Text

The COCO-Text dataset is a dataset for text detection and recognition. It is based on the MS COCO dataset, which contains images of complex everyday scenes. The COCO-Text dataset contains non-text images, legible text images and illegible text images. In total there are 22184 training images and 7026 validation images with at least one instance of legible text.

89 papers6 benchmarksImages, Texts

PathVQA

PathVQA consists of 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness.

89 papers0 benchmarksImages, Medical, Texts

CoNLL-2012

The CoNLL-2012 shared task involved predicting coreference in English, Chinese, and Arabic, using the final version, v5.0, of the OntoNotes corpus. It was a follow-on to the English-only task organized in 2011.

89 papers0 benchmarksTexts

ReClor

Logical reasoning is an important ability to examine, analyze, and critically evaluate arguments as they occur in ordinary language as the definition from Law School Admission Council. ReClor is a dataset extracted from logical reasoning questions of standardized graduate admission examinations.

88 papers10 benchmarksTexts

KP20k

KP20k is a large-scale scholarly articles dataset with 528K articles for training, 20K articles for validation and 20K articles for testing.

87 papers7 benchmarksTexts

ASPEC (Asian Scientific Paper Excerpt Corpus)

ASPEC, Asian Scientific Paper Excerpt Corpus, is constructed by the Japan Science and Technology Agency (JST) in collaboration with the National Institute of Information and Communications Technology (NICT). It consists of a Japanese-English paper abstract corpus of 3M parallel sentences (ASPEC-JE) and a Japanese-Chinese paper excerpt corpus of 680K parallel sentences (ASPEC-JC). This corpus is one of the achievements of the Japanese-Chinese machine translation project which was run in Japan from 2006 to 2010.

87 papers0 benchmarksTexts

CLINC150

This dataset is for evaluating the performance of intent classification systems in the presence of "out-of-scope" queries, i.e., queries that do not fall into any of the system-supported intent classes. The dataset includes both in-scope and out-of-scope data.

87 papers7 benchmarksTexts

MovieQA

The MovieQA dataset is a dataset for movie question answering. to evaluate automatic story comprehension from both video and text. The data set consists of almost 15,000 multiple choice question answers obtained from over 400 movies and features high semantic diversity. Each question comes with a set of five highly plausible answers; only one of which is correct. The questions can be answered using multiple sources of information: movie clips, plots, subtitles, and for a subset scripts and DVS.

86 papers1 benchmarksTexts, Videos

FLoRes-101

FLoRes-101 is an evaluation benchmark for low-resource and multilingual machine translation. It consists of 3001 sentences extracted from English Wikipedia, covering a variety of different topics and domains. These sentences have been translated into 101 languages by professional translators through a carefully controlled process.

86 papers0 benchmarksTexts

ToxiGen

A large-scale and machine-generated dataset of 274,186 toxic and benign statements about 13 minority groups.

85 papers0 benchmarksTexts

WikiBio (Wikipedia Biography Dataset)

This dataset gathers 728,321 biographies from English Wikipedia. It aims at evaluating text generation algorithms. For each article, we provide the first paragraph and the infobox (both tokenized).

84 papers6 benchmarksTexts

How2

The How2 dataset contains 13,500 videos, or 300 hours of speech, and is split into 185,187 training, 2022 development (dev), and 2361 test utterances. It has subtitles in English and crowdsourced Portuguese translations.

84 papers3 benchmarksAudio, Texts, Videos

TweetEval

TweetEval introduces an evaluation framework consisting of seven heterogeneous Twitter-specific classification tasks.

84 papers8 benchmarksTexts

MiniF2F

MiniF2F is a dataset of formal Olympiad-level mathematics problems statements intended to provide a unified cross-system benchmark for neural theorem proving. The miniF2F benchmark currently targets Metamath, Lean, and Isabelle and consists of 488 problem statements drawn from the AIME, AMC, and the International Mathematical Olympiad (IMO), as well as material from high-school and undergraduate mathematics courses.

84 papers0 benchmarksTexts

CodeContests

CodeContests is a competitive programming dataset for machine-learning. This dataset was used when training AlphaCode.

84 papers4 benchmarksTexts

MusicCaps

MusicCaps is a dataset composed of 5.5k music-text pairs, with rich text descriptions provided by human experts. For each 10-second music clip, MusicCaps provides:

84 papers7 benchmarksMusic, Texts

NLVR (Natural Language Visual Reasoningnatural language for visual reasoning)

NLVR contains 92,244 pairs of human-written English sentences grounded in synthetic images. Because the images are synthetically generated, this dataset can be used for semantic parsing.

83 papers3 benchmarksImages, Texts

GovReport

GovReport is a dataset for long document summarization, with significantly longer documents and summaries. It consists of reports written by government research agencies including Congressional Research Service and U.S. Government Accountability Office.

83 papers9 benchmarksTexts

Referring Expressions for DAVIS 2016 & 2017

Our task is to localize and provide a pixel-level mask of an object on all video frames given a language referring expression obtained either by looking at the first frame only or the full video. To validate our approach we employ two popular video object segmentation datasets, DAVIS16 [38] and DAVIS17 [42]. These two datasets introduce various challenges, containing videos with single or multiple salient objects, crowded scenes, similar looking instances, occlusions, camera view changes, fast motion, etc.

82 papers6 benchmarksTexts

Douban (Douban Conversation Corpus)

We release Douban Conversation Corpus, comprising a training data set, a development set and a test set for retrieval based chatbot. The statistics of Douban Conversation Corpus are shown in the following table.

81 papers16 benchmarksTexts

PreviousPage 13 of 158Next