19,997 machine learning datasets
19,997 dataset results
Quora Question Pairs (QQP) dataset consists of over 400,000 question pairs, and each question pair is annotated with a binary value indicating whether the two questions are paraphrase of each other.
The REVERB (REverberant Voice Enhancement and Recognition Benchmark) challenge is a benchmark for evaluation of automatic speech recognition techniques. The challenge assumes the scenario of capturing utterances spoken by a single stationary distant-talking speaker with 1-channe, 2-channel or 8-channel microphone-arrays in reverberant meeting rooms. It features both real recordings and simulated data.
QUASAR-T is a large-scale dataset aimed at evaluating systems designed to comprehend a natural language query and extract its answer from a large corpus of text. It consists of 43,013 open-domain trivia questions and their answers obtained from various internet sources. ClueWeb09 serves as the background corpus for extracting these answers. The answers to these questions are free-form spans of text, though most are noun phrases.
Over a period of many years during the 1990s, a large group of psychologists all over the world collected data in the ISEAR project, directed by Klaus R. Scherer and Harald Wallbott. Student respondents, both psychologists and non-psychologists, were asked to report situations in which they had experienced all of 7 major emotions (joy, fear, anger, sadness, disgust, shame, and guilt). In each case, the questions covered the way they had appraised the situation and how they reacted. The final data set thus contained reports on seven emotions each by close to 3000 respondents in 37 countries on all 5 continents.
MuTual is a retrieval-based dataset for multi-turn dialogue reasoning, which is modified from Chinese high school English listening comprehension test data. It tests dialogue reasoning via next utterance prediction.
The Cross-lingual TRansfer Evaluation of Multilingual Encoders (XTREME) benchmark was introduced to encourage more research on multilingual transfer learning,. XTREME covers 40 typologically diverse languages spanning 12 language families and includes 9 tasks that require reasoning about different levels of syntax or semantics.
OpenDialKG contains utterance from 15K human-to-human role-playing dialogs is manually annotated with ground-truth reference to corresponding entities and paths from a large-scale KG with 1M+ facts.
Fisheye cameras are commonly employed for obtaining a large field of view in surveillance, augmented reality and in particular automotive applications. In spite of its prevalence, there are few public datasets for detailed evaluation of computer vision algorithms on fisheye images. WoodScape is an extensive fisheye automotive dataset named after Robert Wood who invented the fisheye camera in 1906. WoodScape comprises of four surround view cameras and nine tasks including segmentation, depth estimation, 3D bounding box detection and soiling detection. Semantic annotation of 40 classes at the instance level is provided for over 10,000 images and annotation for other tasks are provided for over 100,000 images.
We present a benchmark for image-based 3D reconstruction. The benchmark sequences were acquired outside the lab, in realistic conditions. Ground-truth data was captured using an industrial laser scanner. The benchmark includes both outdoor scenes and indoor environments. High-resolution video sequences are provided as input, supporting the development of novel pipelines that take advantage of video input to increase reconstruction fidelity. We report the performance of many image-based 3D reconstruction pipelines on the new benchmark. The results point to exciting challenges and opportunities for future work.
The Shifts Dataset is a dataset for evaluation of uncertainty estimates and robustness to distributional shift. The dataset, which has been collected from industrial sources and services, is composed of three tasks, with each corresponding to a particular data modality: tabular weather prediction, machine translation, and self-driving car (SDC) vehicle motion prediction. All of these data modalities and tasks are affected by real, `in-the-wild' distributional shifts and pose interesting challenges with respect to uncertainty estimation.
Vision and Language Navigation in Continuous Environments (VLN-CE) is an instruction-guided navigation task with crowdsourced instructions, realistic environments, and unconstrained agent navigation. The dataset consists of 4475 trajectories converted from Room-to-Room train and validation splits. For each trajectory, multiple natural language instructions from Room-to-Room and a pre-computed shortest path are provided following the waypoints via low-level actions.
Emotion-cause pair extraction (ECPE) aims to extract the potential pairs of emotions and corresponding causes in a document. This dataset consists of 1,945 Chinese documents from SINA NEWS website.
PrOntoQA is a question-answering dataset which generates examples with chains-of-thought that describe the reasoning required to answer the questions correctly. The sentences in the examples are syntactically simple and amenable to semantic parsing. It can be used to formally analyze the predicted chain-of-thought from large language models such as GPT-3.
PMC-VQA is a large-scale medical visual question-answering dataset that contains 227k VQA pairs of 149k images that cover various modalities or diseases. The question-answer pairs are generated from PMC-OA.
Tiny ImageNet-C is an open-source data set comprising algorithmically generated corruptions applied to the Tiny ImageNet (ImageNet-200) test set comprising 200 classes following the concept of ImageNet-C. It was introduced by Hendrycks et al. ("Benchmarking Neural Network Robustness to Common Corruptions and Perturbations") and comprises 19 different corruptions (15 test corruptions and 4 validation corruptions) spanning 5 severity levels. This results in 200,000 images for the validation set and 750,000 images for the test set. For further information visit the original GitHub repository of ImageNet-C.
DAQUAR (DAtaset for QUestion Answering on Real-world images) is a dataset of human question answer pairs about images.
FGNet is a dataset for age estimation and face recognition across ages. It is composed of a total of 1,002 images of 82 people with age range from 0 to 69 and an age gap up to 45 years
The Epinions dataset is built form a who-trust-whom online social network of a general consumer review site Epinions.com. Members of the site can decide whether to ''trust'' each other. All the trust relationships interact and form the Web of Trust which is then combined with review ratings to determine which reviews are shown to the user. It contains 75,879 nodes and 50,8837 edges.
The SEMAINE videos dataset contains spontaneous data capturing the audiovisual interaction between a human and an operator undertaking the role of an avatar with four personalities: Poppy (happy), Obadiah (gloomy), Spike (angry) and Prudence (pragmatic). The audiovisual sequences have been recorded at a video rate of 25 fps (352 x 288 pixels). The dataset consists of audiovisual interaction between a human and an operator undertaking the role of an agent (Sensitive Artificial Agent). SEMAINE video clips have been annotated with couples of epistemic states such as agreement, interested, certain, concentration, and thoughtful with continuous rating (within the range [1,-1]) where -1 indicates most negative rating (i.e: No concentration at all) and +1 defines the highest (Most concentration). Twenty-four recording sessions are used in the Solid SAL scenario. Recordings are made of both the user and the operator, and there are usually four character interactions in each recording session,
CrossTask dataset contains instructional videos, collected for 83 different tasks. For each task an ordered list of steps with manual descriptions is provided. The dataset is divided in two parts: 18 primary and 65 related tasks. Videos for the primary tasks are collected manually and provided with annotations for temporal step boundaries. Videos for the related tasks are collected automatically and don't have annotations.