Datasets

3,275 machine learning datasets

3,275 dataset results

CLEVR-Hans

The CLEVR-Hans data set is a novel confounded visual scene data set, which captures complex compositions of different objects. This data set consists of CLEVR images divided into several classes.

16 papers0 benchmarksImages

Casual Conversations dataset is designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of age, genders, apparent skin tones and ambient lighting conditions.

16 papers0 benchmarksAudio, Images, Videos

NewsCLIPpings

NewsCLIPpings is a dataset for detecting mismatched images and captions. Different to previous misinformation datasets, in NewsCLIPpings both the images and captions are unmanipulated, but some of them are mismatched.

16 papers0 benchmarksImages, Texts

SketchyCOCO

SketchyCOCO dataset consists of two parts:

16 papers3 benchmarksImages

e-SNLI-VE

e-SNLI-VE is a large VL (vision-language) dataset with NLEs (natural language explanations) with over 430k instances for which the explanations rely on the image content. It has been built by merging the explanations from e-SNLI and the image-sentence pairs from SNLI-VE.

16 papers2 benchmarksImages, Texts

ICFG-PEDES (Identity-Centric and Fine-Grained Person Description Dataset)

One large-scale database for Text-to-Image Person Re-identification, i.e., Text-based Person Retrieval.

16 papers14 benchmarksImages, Texts

XQLFW (Cross-Quality Labeled Faces in the Wild)

An evaluation protocol for face verification focusing on a large intra-pair image quality difference.

16 papers6 benchmarksImages

Wukong

Wukong is a large-scale Chinese cross-modal dataset for benchmarking different multi-modal pre-training methods to facilitate the Vision-Language Pre-training (VLP). This dataset contains 100 million Chinese image-text pairs from the web. This base query list is taken from and is filtered according to the frequency of Chinese words and phrases.

16 papers0 benchmarksImages, Texts

SKM-TEA (Stanford Knee MRI with Multi-Task Evaluation)

The SKM-TEA dataset pairs raw quantitative knee MRI (qMRI) data, image data, and dense labels of tissues and pathology for end-to-end exploration and evaluation of the MR imaging pipeline. This 1.6TB dataset consists of raw-data measurements of ~25,000 slices (155 patients) of anonymized patient knee MRI scans, the corresponding scanner-generated DICOM images, manual segmentations of four tissues, and bounding box annotations for sixteen clinically relevant pathologies.

16 papers0 benchmarksImages, MRI, Medical

CholecT45

CholecT45 is a subset of CholecT50 consisting of 45 videos from the Cholec80 dataset. It is the first public release of part of CholecT50 dataset. CholecT50 is a dataset of 50 endoscopic videos of laparoscopic cholecystectomy surgery introduced to enable research on fine-grained action recognition in laparoscopic surgery. It is annotated with 100 triplet classes in the form of <instrument, verb, target>.

16 papers2 benchmarksImages, Videos

GRIT (General Robust Image Task Benchmark)

The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:

16 papers9 benchmarksImages, Texts

VideoLQ

VideoLQ consists of videos downloaded from various video hosting sites such as Flickr and YouTube, with a Creative Common license.

16 papers3 benchmarksImages, Videos

GSV-Cities

GSV-Cities is a large-scale dataset for training deep neural network for the task of Visual Place Recognition.

16 papers0 benchmarksImages

V-D4RL

V-D4RL provides pixel-based analogues of the popular D4RL benchmarking tasks, derived from the dm_control suite, along with natural extensions of two state-of-the-art online pixel-based continuous control algorithms, DrQ-v2 and DreamerV2, to the offline setting.

16 papers0 benchmarksActions, Images, Replay data

GeneCIS

GeneCIS benchmark is designed for measuring models’ ability to adapt to a range of similarity conditions, which is zero-shot evaluation only.

16 papers4 benchmarksImages, Texts

Flickr30K-Noisy (Flickr-30K with 20% of Noisy Correspondence)

This dataset, based on Flickr30K, is introduced in Learning with Noisy Correspondence for Cross-modal Matching. Noisy correspondence is simulated by randomly shuffling the captions of training images for a specific percentage, denoted by noise ratio

16 papers21 benchmarksImages

REDS (REalistic and Diverse Scenes dataset realistic and dynamic scenes)

The realistic and dynamic scenes (REDS) dataset was proposed in the NTIRE19 Challenge. The dataset is composed of 300 video sequences with resolution of 720×1,280, and each video has 100 frames, where the training set, the validation set and the testing set have 240, 30 and 30 videos, respectively

15 papers4 benchmarksImages, Videos

KolektorSDD (Kolektor Surface-Defect Dataset)

The dataset is constructed from images of defective production items that were provided and annotated by Kolektor Group d.o.o.. The images were captured in a controlled industrial environment in a real-world case.

15 papers2 benchmarksImages

QED

QED is a linguistically principled framework for explanations in question answering. Given a question and a passage, QED represents an explanation of the answer as a combination of discrete, human-interpretable steps: sentence selection := identification of a sentence implying an answer to the question referential equality := identification of noun phrases in the question and the answer sentence that refer to the same thing predicate entailment := confirmation that the predicate in the sentence entails the predicate in the question once referential equalities are abstracted away. The QED dataset is an expert-annotated dataset of QED explanations build upon a subset of the Google Natural Questions dataset.

15 papers2 benchmarksImages

Syn2Real

Syn2Real, a synthetic-to-real visual domain adaptation benchmark meant to encourage further development of robust domain transfer methods. The goal is to train a model on a synthetic "source" domain and then update it so that its performance improves on a real "target" domain, without using any target annotations. It includes three tasks, illustrated in figures above: the more traditional closed-set classification task with a known set of categories; the less studied open-set classification task with unknown object categories in the target domain; and the object detection task, which involves localizing instances of objects by predicting their bounding boxes and corresponding class labels.

15 papers0 benchmarksImages

PreviousPage 42 of 164Next