Datasets

19,997 machine learning datasets

19,997 dataset results

Sim10k

SIM10k is a synthetic dataset containing 10,000 images, which is rendered from the video game Grand Theft Auto V (GTA5).

92 papers1 benchmarksImages

The TGIF-QA dataset contains 165K QA pairs for the animated GIFs from the TGIF dataset [Li et al. CVPR 2016]. The question & answer pairs are collected via crowdsourcing with a carefully designed user interface to ensure quality. The dataset can be used to evaluate video-based Visual Question Answering techniques.

92 papers5 benchmarksTexts, Videos

CBT (Children’s Book Test)

Children’s Book Test (CBT) is designed to measure directly how well language models can exploit wider linguistic context. The CBT is built from books that are freely available thanks to Project Gutenberg.

92 papers0 benchmarksTexts

VITON (VITON-Zalando Dataset)

VITON was a dataset for virtual try-on of clothing items. It consisted of 16,253 pairs of images of a person and a clothing item .

92 papers12 benchmarksImages

MEAD (A Large-scale Audio-visual Dataset for Emotional Talking-face Generation)

Multi-view Emotional Audio-visual Dataset

92 papers1 benchmarks

JFLEG (JHU FLuency-Extended GUG corpus)

JFLEG is for developing and evaluating grammatical error correction (GEC). Unlike other corpora, it represents a broad range of language proficiency levels and uses holistic fluency edits to not only correct grammatical errors but also make the original text more native sounding.

91 papers2 benchmarksTexts

MIT-States

The MIT-States dataset has 245 object classes, 115 attribute classes and ∼53K images. There is a wide range of objects (e.g., fish, persimmon, room) and attributes (e.g., mossy, deflated, dirty). On average, each object instance is modified by one of the 9 attributes it affords.

91 papers13 benchmarksImages

FaceWarehouse

FaceWarehouse is a 3D facial expression database that provides the facial geometry of 150 subjects, covering a wide range of ages and ethnic backgrounds.

91 papers1 benchmarks3D, Images

WikiMatrix

WikiMatrix is a dataset of parallel sentences in the textual content of Wikipedia for all possible language pairs. The mined data consists of:

91 papers0 benchmarksTexts

Spider-Realistic

Spider dataset is used for evaluation in the paper "Structure-Grounded Pretraining for Text-to-SQL". The dataset is created based on the dev split of the Spider dataset (2020-06-07 version from https://yale-lily.github.io/spider). We manually modified the original questions to remove the explicit mention of column names while keeping the SQL queries unchanged to better evaluate the model's capability in aligning the NL utterance and the DB schema. For more details, please check our paper at https://arxiv.org/abs/2010.12773.

91 papers0 benchmarks

DanceTrack

A large-scale multi-object tracking dataset for human tracking in occlusion, frequent crossover, uniform appearance and diverse body gestures. It is proposed to emphasize the importance of motion analysis in multi-object tracking instead of mainly appearance-matching-based diagram.

91 papers10 benchmarksImages, Videos

PopQA

PopQA is an open-domain QA dataset with 14k QA pairs with fine-grained Wikidata entity ID, Wikipedia page views, and relationship type information.

91 papers1 benchmarksTexts

MSCOCO

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

90 papers57 benchmarks

Oxford-IIIT Pet Dataset

The Oxford-IIIT Pet Dataset has 37 categories with roughly 200 images for each class. The images have a large variations in scale, pose and lighting. All images have an associated ground truth annotation of breed, head ROI, and pixel level trimap segmentation.

90 papers16 benchmarksImages

E2E (End-to-End NLG Challenge)

End-to-End NLG Challenge (E2E) aims to assess whether recent end-to-end NLG systems can generate more complex output by learning from datasets containing higher lexical richness, syntactic complexity and diverse discourse phenomena.

90 papers11 benchmarksTexts

ST-VQA (Scene Text Visual Question Answering)

ST-VQA aims to highlight the importance of exploiting high-level semantic information present in images as textual cues in the VQA process.

90 papers0 benchmarksImages, Texts

Winoground

Winoground is a dataset for evaluating the ability of vision and language models to conduct visio-linguistic compositional reasoning. Given two images and two captions, the goal is to match them correctly -- but crucially, both captions contain a completely identical set of words, only in a different order. The dataset was carefully hand-curated by expert annotators and is labeled with a rich set of fine-grained tags to assist in analyzing model performance.

90 papers3 benchmarksImages, Texts

MSU-MFSD

The MSU-MFSD dataset contains 280 video recordings of genuine and attack faces. 35 individuals have participated in the development of this database with a total of 280 videos. Two kinds of cameras with different resolutions (720×480 and 640×480) were used to record the videos from the 35 individuals. For the real accesses, each individual has two video recordings captured with the Laptop cameras and Android, respectively. For the video attacks, two types of cameras, the iPhone and Canon cameras were used to capture high definition videos on each of the subject. The videos taken with Canon camera were then replayed on iPad Air screen to generate the HD replay attacks while the videos recorded by the iPhone mobile were replayed itself to generate the mobile replay attacks. Photo attacks were produced by printing the 35 subjects’ photos on A3 papers using HP colour printer. The recording videos with respect to the 35 individuals were divided into training (15 subjects with 120 videos) an

89 papers16 benchmarksImages, Videos

COCO-Text

The COCO-Text dataset is a dataset for text detection and recognition. It is based on the MS COCO dataset, which contains images of complex everyday scenes. The COCO-Text dataset contains non-text images, legible text images and illegible text images. In total there are 22184 training images and 7026 validation images with at least one instance of legible text.

89 papers6 benchmarksImages, Texts

PathVQA

PathVQA consists of 32,799 open-ended questions from 4,998 pathology images where each question is manually checked to ensure correctness.

89 papers0 benchmarksImages, Medical, Texts

PreviousPage 37 of 1000Next