Datasets

3,148 machine learning datasets

3,148 dataset results

WMT 2018

WMT 2018 is a collection of datasets used in shared tasks of the Third Conference on Machine Translation. The conference builds on a series of twelve previous annual workshops and conferences on Statistical Machine Translation.

37 papers0 benchmarksTexts

HOC (Hallmarks of Cancer)

The Hallmarks of Cancer (*HOC) corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to the Hallmarks of Cancer taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus.

37 papers6 benchmarksTexts

SemEval-2018 Task-9

The SemEval-2018 hypernym discovery evaluation benchmark (Camacho-Collados et al. 2018) contains three domains (general, medical and music) and is also available in Italian and Spanish (not in this repository). For each domain a target corpus and vocabulary (i.e. hypernym search space) are provided. The dataset contains both concepts (e.g. dog) and entities (e.g. Manchester United) up to trigrams.

37 papers0 benchmarksTexts

TextOCR

TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.

37 papers0 benchmarksImages, Texts

Yelp Review Polarity

The Yelp Reviews Polarity dataset is obtained from the Yelp Dataset Challenge in 2015 (1,569,264 samples that have review text).

37 papers0 benchmarksTexts

CV-Bench (Cambrian Vision-Centric Benchmark)

The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.

37 papers0 benchmarksImages, Texts

EBM-NLP

EBM-NLP annotates PICO (Participants, Interventions, Comparisons and Outcomes) spans in clinical trial abstracts. The corresponding PICO Extraction task aims to identify the spans in clinical trial abstracts that describe the respective PICO elements.

36 papers2 benchmarksTexts

DialogRE

DialogRE is the first human-annotated dialogue-based relation extraction dataset, containing 1,788 dialogues originating from the complete transcripts of a famous American television situation comedy Friends. The are annotations for all occurrences of 36 possible relation types that exist between an argument pair in a dialogue. DialogRE is available in English and Chinese.

36 papers5 benchmarksTexts

stocknet (stocknet-dataset)

stocknet-dataset This repository releases a comprehensive dataset for stock movement prediction from tweets and historical stock prices. Please cite the following paper [bib] if you use this dataset,

36 papers2 benchmarksTexts, Time series

Panlex

PanLex translates words in thousands of languages. Its database is panlingual (emphasizes coverage of every language) and lexical (focuses on words, not sentences).

36 papers0 benchmarksTexts

Fashion-Gen

Fashion-Gen consists of 293,008 high definition (1360 x 1360 pixels) fashion images paired with item descriptions provided by professional stylists. Each item is photographed from a variety of angles.

36 papers0 benchmarksImages, Texts

OTT-QA

The Open Table-and-Text Question Answering (OTT-QA) dataset contains open questions which require retrieving tables and text from the web to answer. This dataset is re-annotated from the previous HybridQA dataset. The dataset is collected by UCSB NLP group and issued under MIT license.

36 papers1 benchmarksTexts

VisualMRC (VisualMRC: Machine Reading Comprehension on Document Images)

VisualMRC is a visual machine reading comprehension dataset that proposes a task: given a question and a document image, a model produces an abstractive answer.

36 papers2 benchmarksImages, Texts

Doc2Dial (Doc2Dial: Document-grounded Dialogue)

For goal-oriented document-grounded dialogs, it often involves complex contexts for identifying the most relevant information, which requires better understanding of the inter-relations between conversations and documents. Meanwhile, many online user-oriented documents use both semi-structured and unstructured contents for guiding users to access information of different contexts. Thus, we create a new goal-oriented document-grounded dialogue dataset that captures more diverse scenarios derived from various document contents from multiple domains such ssa.gov and studentaid.gov. For data collection, we propose a novel pipeline approach for dialogue data construction, which has been adapted and evaluated for several domains.

36 papers0 benchmarksTexts

AdvGLUE (Adversarial GLUE)

Adversarial GLUE (AdvGLUE) is a new multi-task benchmark to quantitatively and thoroughly explore and evaluate the vulnerabilities of modern large-scale language models under various types of adversarial attacks. In particular, we systematically apply 14 textual adversarial attack methods to GLUE tasks to construct AdvGLUE, which is further validated by humans for reliable annotations.

36 papers1 benchmarksTexts

MAD

MAD (Movie Audio Descriptions) is an automatically curated large-scale dataset for the task of natural language grounding in videos or natural language moment retrieval. MAD exploits available audio descriptions of mainstream movies. Such audio descriptions are redacted for visually impaired audiences and are therefore highly descriptive of the visual content being displayed. MAD contains over 384,000 natural language sentences grounded in over 1,200 hours of video, and provides a unique setup for video grounding as the visual stream is truly untrimmed with an average video duration of 110 minutes. 2 orders of magnitude longer than legacy datasets.

36 papers29 benchmarksTexts, Videos

InfoSeek (Visual Information Seeking)

In this project, we introduce InfoSeek, a visual question answering dataset tailored for information-seeking questions that cannot be answered with only common sense knowledge. Using InfoSeek, we analyze various pre-trained visual question answering models and gain insights into their characteristics. Our findings reveal that state-of-the-art pre-trained multi-modal models (e.g., PaLI-X, BLIP2, etc.) face challenges in answering visual information-seeking questions, but fine-tuning on the InfoSeek dataset elicits models to use fine-grained knowledge that was learned during their pre-training.

36 papers2 benchmarksImages, Texts

Open Entity

The Open Entity dataset is a collection of about 6,000 sentences with fine-grained entity types annotations. The entity types are free-form noun phrases that describe appropriate types for the role the target entity plays in the sentence. Sentences were sampled from Gigaword, OntoNotes and web articles. On average each sentence has 5 labels.

35 papers1 benchmarksTexts

doc2dial

A new dataset of goal-oriented dialogues that are grounded in the associated documents.

35 papers0 benchmarksTexts

Spot-the-diff

Spot-the-diff is a dataset consisting of 13,192 image pairs along with corresponding human provided text annotations stating the differences between the two images.

35 papers0 benchmarksImages, Texts

PreviousPage 23 of 158Next