Datasets

19,997 machine learning datasets

19,997 dataset results

Quora Question Pairs

Quora Question Pairs (QQP) dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other.

55 papers25 benchmarksTexts

ReVerb Challenge (REverberant Voice Enhancement and Recognition Benchmark)

The REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge is a benchmark for evaluation of automatic speech recognition techniques. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channe, 2-channel or 8-channel microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data.

55 papers0 benchmarksAudio, Speech

QUASAR-T (QUestion Answering by Search And Reading – Trivia)

QUASAR-T is a large-scale dataset aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. It consists of 43,013 open-domain trivia questions and their answers obtained from various internet sources. ClueWeb09 serves as the background corpus for extracting these answers. The answers to these questions are free-form spans of text, though most are noun phrases.

55 papers0 benchmarksTexts

ISEAR (International Survey on Emotion Antecedents and Reactions)

Over a period of many years during the 1990s, a large group of psychologists all over the world collected data in the ISEAR project, directed by Klaus R. Scherer and Harald Wallbott. Student respondents, both psychologists and non-psychologists, were asked to report situations in which they had experienced all of 7 major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. The final data set thus contained reports on seven emotions each by close to 3000 respondents in 37 countries on all 5 continents.

55 papers0 benchmarksTexts

MuTual

MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. It tests dialogue reasoning via next utterance prediction.

55 papers0 benchmarksTexts

XTREME (Cross-Lingual Transfer Evaluation of Multilingual Encoders)

The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark was introduced to encourage more research on multilingual transfer learning,. XTREME covers 40 typologically diverse languages spanning 12 language families and includes 9 tasks that require reasoning about different levels of syntax or semantics.

55 papers12 benchmarksTexts

OpenDialKG

OpenDialKG contains utterance from 15K human-to-human role-playing dialogs is manually annotated with ground-truth reference to corresponding entities and paths from a large-scale KG with 1M+ facts.

55 papers0 benchmarks

WoodScape

Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. WoodScape is an extensive fisheye automotive dataset named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images.

55 papers2 benchmarksImages

Tanks and Temples

We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.

55 papers6 benchmarks

Shifts

The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation.

55 papers1 benchmarksTexts, Time series

VLN-CE (Vision-and-Language Navigation in Continuous Environments)

Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. The dataset consists of 4475 trajectories converted from Room-to-Room train and validation splits. For each trajectory, multiple natural language instructions from Room-to-Room and a pre-computed shortest path are provided following the waypoints via low-level actions.

55 papers2 benchmarks

Xia and Ding, 2019

Emotion-cause pair extraction (ECPE) aims to extract the potential pairs of emotions and corresponding causes in a document. This dataset consists of 1,945 Chinese documents from SINA NEWS website.

55 papers0 benchmarksTexts

PrOntoQA (Proof and Ontology-Generated Question-Answering)

PrOntoQA is a question-answering dataset which generates examples with chains-of-thought that describe the reasoning required to answer the questions correctly. The sentences in the examples are syntactically simple and amenable to semantic parsing. It can be used to formally analyze the predicted chain-of-thought from large language models such as GPT-3.

55 papers0 benchmarksTexts

PMC-VQA

PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.

55 papers3 benchmarksImages, Medical, Texts

Tiny-ImageNet-C

Tiny ImageNet-C is an open-source data set comprising algorithmically generated corruptions applied to the Tiny ImageNet (ImageNet-200) test set comprising 200 classes following the concept of ImageNet-C. It was introduced by Hendrycks et al. ("Benchmarking Neural Network Robustness to Common Corruptions and Perturbations") and comprises 19 different corruptions (15 test corruptions and 4 validation corruptions) spanning 5 severity levels. This results in 200,000 images for the validation set and 750,000 images for the test set. For further information visit the original GitHub repository of ImageNet-C.

55 papers0 benchmarksImages

DAQUAR

DAQUAR (DAtaset for QUestion Answering on Real-world images) is a dataset of human question answer pairs about images.

54 papers0 benchmarksImages, Texts

FG-NET

FGNet is a dataset for age estimation and face recognition across ages. It is composed of a total of 1,002 images of 82 people with age range from 0 to 69 and an age gap up to 45 years

54 papers6 benchmarksImages

Epinions

The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. It contains 75,879 nodes and 50,8837 edges.

54 papers8 benchmarksGraphs

SEMAINE

The SEMAINE videos dataset contains spontaneous data capturing the audiovisual interaction between a human and an operator undertaking the role of an avatar with four personalities: Poppy (happy), Obadiah (gloomy), Spike (angry) and Prudence (pragmatic). The audiovisual sequences have been recorded at a video rate of 25 fps (352 x 288 pixels). The dataset consists of audiovisual interaction between a human and an operator undertaking the role of an agent (Sensitive Artificial Agent). SEMAINE video clips have been annotated with couples of epistemic states such as agreement, interested, certain, concentration, and thoughtful with continuous rating (within the range [1,-1]) where -1 indicates most negative rating (i.e: No concentration at all) and +1 defines the highest (Most concentration). Twenty-four recording sessions are used in the Solid SAL scenario. Recordings are made of both the user and the operator, and there are usually four character interactions in each recording session,

54 papers4 benchmarks3D, Audio, Images, Videos

CrossTask

CrossTask dataset contains instructional videos, collected for 83 different tasks. For each task an ordered list of steps with manual descriptions is provided. The dataset is divided in two parts: 18 primary and 65 related tasks. Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries. Videos for the related tasks are collected automatically and don't have annotations.

54 papers4 benchmarksTexts, Videos

PreviousPage 53 of 1000Next