Datasets

19,997 machine learning datasets

19,997 dataset results

Multi-News

Multi-News, consists of news articles and human-written summaries of these articles from the site newser.com. Each summary is professionally written by editors and includes links to the original articles cited.

122 papers9 benchmarksTexts

LibriMix

LibriMix is an open-source alternative to wsj0-2mix. Based on LibriSpeech, LibriMix consists of two- or three-speaker mixtures combined with ambient noise samples from WHAM!.

122 papers0 benchmarksSpeech

BBQ (Bias Benchmark for QA)

Bias Benchmark for QA (BBQ) is a dataset consisting of question-sets constructed by the authors that highlight attested social biases against people belonging to protected classes along nine different social dimensions relevant for U.S. English-speaking contexts.

122 papers0 benchmarksTexts

MAWPS (MAth Word ProblemS)

MAWPS is an online repository of Math Word Problems, to provide a unified testbed to evaluate different algorithms. MAWPS allows for the automatic construction of datasets with particular characteristics, providing tools for tuning the lexical and template overlap of a dataset as well as for filtering ungrammatical problems from web-sourced corpora. The online nature of this repository facilitates easy community contribution. Amassed 3,320 problems, including the full datasets used in several previous works.

122 papers4 benchmarks

SemEval-2010 Task-8

The dataset for the SemEval-2010 Task 8 is a dataset for multi-way classification of mutually exclusive semantic relations between pairs of nominals.

121 papers1 benchmarks

GENIA

The GENIA corpus is the primary collection of biomedical literature compiled and annotated within the scope of the GENIA project. The corpus was created to support the development and evaluation of information extraction and text mining systems for the domain of molecular biology.

121 papers5 benchmarksMedical, Texts

Permuted MNIST

Permuted MNIST is an MNIST variant that consists of 70,000 images of handwritten digits from 0 to 9, where 60,000 images are used for training, and 10,000 images for test. The difference of this dataset from the original MNIST is that each of the ten tasks is the multi-class classification of a different random permutation of the input pixels.

121 papers4 benchmarksImages

ETH3D

ETHD is a multi-view stereo benchmark / 3D reconstruction benchmark that covers a variety of indoor and outdoor scenes. Ground truth geometry has been obtained using a high-precision laser scanner. A DSLR camera as well as a synchronized multi-camera rig with varying field-of-view was used to capture images.

121 papers7 benchmarksImages, RGB-D, Stereo

GTEA (Georgia Tech Egocentric Activity)

The Georgia Tech Egocentric Activities (GTEA) dataset contains seven types of daily activities such as making sandwich, tea, or coffee. Each activity is performed by four different people, thus totally 28 videos. For each video, there are about 20 fine-grained action instances such as take bread, pour ketchup, in approximately one minute.

120 papers20 benchmarksImages, Videos

SimpleQuestions

SimpleQuestions is a large-scale factoid question answering dataset. It consists of 108,442 natural language questions, each paired with a corresponding fact from Freebase knowledge base. Each fact is a triple (subject, relation, object) and the answer to the question is always the object. The dataset is divided into training, validation, and test sets with 75,910, 10,845 and 21,687 questions respectively.

120 papers2 benchmarksTexts

SHAPES (Swarm Heuristics based Adaptive and Penalized Estimation of Splines)

SHAPES is a dataset of synthetic images designed to benchmark systems for understanding of spatial and logical relations among multiple objects. The dataset consists of complex questions about arrangements of colored shapes. The questions are built around compositions of concepts and relations, e.g. Is there a red shape above a circle? or Is a red shape blue?. Questions contain between two and four attributes, object types, or relationships. There are 244 questions and 15,616 images in total, with all questions having a yes and no answer (and corresponding supporting image). This eliminates the risk of learning biases.

120 papers2 benchmarksImages, Texts

SIQA (Social Interaction QA)

Social Interaction QA (SIQA) is a question-answering benchmark for testing social commonsense intelligence. Contrary to many prior benchmarks that focus on physical or taxonomic knowledge, Social IQa focuses on reasoning about people’s actions and their social implications. For example, given an action like "Jesse saw a concert" and a question like "Why did Jesse do this?", humans can easily infer that Jesse wanted "to see their favorite performer" or "to enjoy the music", and not "to see what's happening inside" or "to see if it works". The actions in Social IQa span a wide variety of social situations, and answer candidates contain both human-curated answers and adversarially-filtered machine-generated candidates. Social IQa contains over 37,000 QA pairs for evaluating models’ abilities to reason about the social implications of everyday events and situations.

120 papers1 benchmarksTexts

Raindrop

Raindrop is a set of image pairs, where each pair contains exactly the same background scene, yet one is degraded by raindrops and the other one is free from raindrops. To obtain this, the images are captured through two pieces of exactly the same glass: one sprayed with water, and the other is left clean. The dataset consists of 1,119 pairs of images, with various background scenes and raindrops. They were captured with a Sony A6000 and a Canon EOS 60.

120 papers2 benchmarksImages

shape bias

The 'shape bias' dataset was introduced in Geirhos et al. (ICLR 2019) and consists of 224x224 images with conflicting texture and shape information (e.g., cat shape with elephant texture). This is used to measure the shape vs. texture bias of image classifiers.

120 papers1 benchmarksImages

SEED (SJTU Emotion EEG Dataset)

The SEED dataset contains subjects' EEG signals when they were watching films clips. The film clips are carefully selected so as to induce different types of emotion, which are positive, negative, and neutral ones.

119 papers1 benchmarksEEG

MAESTRO

The MAESTRO dataset contains over 200 hours of paired audio and MIDI recordings from ten years of International Piano-e-Competition. The MIDI data includes key strike velocities and sustain/sostenuto/una corda pedal positions. Audio and MIDI files are aligned with ∼3 ms accuracy and sliced to individual musical pieces, which are annotated with composer, title, and year of performance. Uncompressed audio is of CD quality or higher (44.1–48 kHz 16-bit PCM stereo).

118 papers1 benchmarksAudio, Interactive, Midi, Music

VATEX (Video And TEXt)

VATEX is multilingual, large, linguistically complex, and diverse dataset in terms of both video and natural language descriptions. It has two tasks for video-and-language research: (1) Multilingual Video Captioning, aimed at describing a video in various languages with a compact unified captioning model, and (2) Video-guided Machine Translation, to translate a source language description into the target language using the video information as additional spatiotemporal context.

118 papers26 benchmarksTexts, Videos

WritingPrompts

WritingPrompts is a large dataset of 300K human-written stories paired with writing prompts from an online forum.

118 papers6 benchmarksTexts

AFLW2000-3D

AFLW2000-3D is a dataset of 2000 images that have been annotated with image-level 68-point 3D facial landmarks. This dataset is used for evaluation of 3D facial landmark detection models. The head poses are very diverse and often hard to be detected by a CNN-based face detector.

117 papers45 benchmarks3D, Images

Something-Something V1

The 20BN-SOMETHING-SOMETHING dataset is a large collection of labeled video clips that show humans performing pre-defined basic actions with everyday objects. The dataset was created by a large number of crowd workers. It allows machine learning models to develop fine-grained understanding of basic actions that occur in the physical world. It contains 108,499 videos, with 86,017 in the training set, 11,522 in the validation set and 10,960 in the test set. There are 174 labels.

117 papers11 benchmarksImages, Videos

PreviousPage 29 of 1000Next