Datasets

3,148 machine learning datasets

3,148 dataset results

OVQA

OVQA contains 19,020 medical visual question and answer pairs generated from 2,001 medical images collected from 2,212 EMRs in Orthopedics.

4 papers0 benchmarksImages, Medical, Texts

A Game Of Sorts is a collaborative image ranking task. Players are asked to rank a set of images based on a given sorting criterion. The game provides a framework for the evaluation of visually grounded language understanding and generation of referring expressions in multimodal dialogue settings.

4 papers0 benchmarksImages, Texts

Mindgames

We generate epistemic reasoning problems using modal logic to target theory of mind (tom) in natural language processing models.

4 papers0 benchmarksTexts

Szeged Corpus

The Szeged Treebank is the largest fully manually annotated treebank of the Hungarian language. It contains 82,000 sentences, 1.2 million words and 250,000 punctuation marks. Texts were selected from six different domains, ~200,000 words in size from each. The domains are the following:

4 papers0 benchmarksTexts

CompMix

CompMix is a crowdsourced QA benchmark which naturally demands the integration of a mixture of input sources. CompMix has a total of 9,410 questions, and features several complex intents like joins and temporal conditions.

4 papers0 benchmarksTexts

ALTA 2021 Shared Task (Automatic Grading of Evidence, 10 years later)

This dataset is described in the ALTA 2021 Shared Task website and associated CodaLab competition.

4 papers0 benchmarksTexts

CommitChronicle

CommitChronicle is a dataset for commit message generation (and/or completion).

4 papers0 benchmarksTexts

OVDEval

OVDEval includes 9 sub-tasks and introduces evaluations on commonsense knowledge, attribute understanding, position understanding, object relation comprehension, and more. The dataset is meticulously created to provide hard negatives that challenge models' true understanding of visual and linguistic input.

4 papers0 benchmarksImages, Texts

DiaASQ (Conversational Aspect-based Sentiment Quadruple Extraction)

DiaASQ is a fine-grained Aspect-based Sentiment Analysis (ABSA) benchmark under the conversation scenario. It challenges existing ABSA methods by 1) extracting quadruple of target-aspect-opinion-sentiment in a dialogue, and 2) modeling the dialogue discourse structures. The dataset is constructed by systematically crawling tweets from digital bloggers, followed by a series of preprocessing steps including filtering, normalizing, pruning, and annotating the collected dialogues, resulting in a final corpus of 1,000 dialogues. To enhance the multilingual usability, DiaASQ has both the English and Chinese versions of languages.

4 papers0 benchmarksTexts

FinVis

Pretrain: 200k Instruction: 100k

4 papers0 benchmarksImages, Texts

ITALIC

ITALIC: An ITALian Intent Classification Dataset

4 papers0 benchmarksAudio, Texts

KnowEdit

This is the dataset for knowledge editing. It contains six tasks: ZsRE, $Wiki_{recent}$, $Wiki_{counterfact}$, WikiBio, ConvSent and Sanitation. This repo shows the former 4 tasks and you can get the data for ConvSent and Sanitation from their original papers.

4 papers0 benchmarksTexts

MetaHate

MetaHate: A Dataset for Unifying Efforts on Hate Speech Detection This is MetaHate: a meta-collection of 36 hate speech datasets from social media comments.

4 papers0 benchmarksTexts

CHOCOLATE (Captions Have Often ChOsen Lies About The Evidence)

CHOCOLATE is a benchmark for detecting and correcting factual inconsistency in generated chart captions. It consists of captions produced by six advanced models, which are categorized into three subsets:

4 papers1 benchmarksImages, Texts

Multi-Label Classification Dataset Repository

For each dataset we provide a short description as well as some characterization metrics. It includes the number of instances (m), number of attributes (d), number of labels (q), cardinality (Card), density (Dens), diversity (Div), average Imbalance Ratio per label (avgIR), ratio of unconditionally dependent label pairs by chi-square test (rDep) and complexity, defined as m × q × d as in [Read 2010]. Cardinality measures the average number of labels associated with each instance, and density is defined as cardinality divided by the number of labels. Diversity represents the percentage of labelsets present in the dataset divided by the number of possible labelsets. The avgIR measures the average degree of imbalance of all labels, the greater avgIR, the greater the imbalance of the dataset. Finally, rDep measures the proportion of pairs of labels that are dependent at 99% confidence. A broader description of all the characterization metrics and the used partition methods are described in

4 papers0 benchmarksAudio, Biology, Images, Medical, Music, Texts, Videos

Polaris (Polaris dataset)

The Polaris dataset offers a large-scale, diverse benchmark for evaluating metrics for image captioning, surpassing existing datasets in terms of size, caption diversity, number of human judgments, and granularity of the evaluations. It includes 131,020 generated captions and 262,040 reference captions. The generated captions have a vocabulary of 3,154 unique words and the reference captions have a vocabulary of 22,275 unique words.

4 papers0 benchmarksImages, Texts

VietMed (VietMed: A Dataset and Benchmark for Automatic Speech Recognition of Vietnamese in the Medical Domain)

We introduced a Vietnamese speech recognition dataset in the medical domain comprising 16h of labeled medical speech, 1000h of unlabeled medical speech and 1200h of unlabeled general-domain speech. To our best knowledge, VietMed is by far the world’s largest public medical speech recognition dataset in 7 aspects: total duration, number of speakers, diseases, recording conditions, speaker roles, unique medical terms and accents. VietMed is also by far the largest public Vietnamese speech dataset in terms of total duration. Additionally, we are the first to present a medical ASR dataset covering all ICD-10 disease groups and all accents within a country.

4 papers3 benchmarksAudio, Medical, Speech, Texts

LLM-Seg40K

LLM-Seg40K dataset contains 14K images in total. The dataset is divided into training, validation, and test sets, containing 11K, 1K, and 2K images respectively. For the training split, each image has 3.95 questions on average and the average question question length is 15.2 words. The training set contains 1458 different categories in total.

4 papers0 benchmarksImages, Texts

OpenViVQA (Open-domain Visual Question Answering in Vietnamese)

In recent years, visual question answering (VQA) has attracted attention from the research community because of its highly potential applications (such as virtual assistance on intelligent cars, assistant devices for blind people, or information retrieval from document images using natural language as queries) and challenge. The VQA task requires methods that have the ability to fuse the information from questions and images to produce appropriate answers. Neural visual question answering models have achieved tremendous growth on large-scale datasets which are mostly for resource-rich languages such as English. However, available datasets narrow the VQA task as the answers selection task or answer classification task. We argue that this form of VQA is far from human ability and eliminates the challenge of the answering aspect in the VQA task by just selecting answers rather than generating them. In this paper, we introduce the OpenViVQA (Open-domain Vietnamese Visual Question Answering

4 papers0 benchmarksImages, Texts

EVJVQA (English-Japanese-Vietnamese Visual Question Answering)

EVJVQA, the first multilingual Visual Question Answering dataset with three languages: English, Vietnamese, and Japanese, is released in this task. UIT-EVJVQA includes question-answer pairs created by humans on a set of images taken in Vietnam, with the answer created from the input question and the corresponding image. EVJVQA consists of 33,000+ question-answer pairs for evaluating the mQA models.

4 papers0 benchmarksImages, Texts

PreviousPage 72 of 158Next