Datasets

19,997 machine learning datasets

19,997 dataset results

MLQA (MultiLingual Question Answering)

MLQA (MultiLingual Question Answering) is a benchmark dataset for evaluating cross-lingual question answering performance. MLQA consists of over 5K extractive QA instances (12K in English) in SQuAD format in seven languages - English, Arabic, German, Spanish, Hindi, Vietnamese and Simplified Chinese. MLQA is highly parallel, with QA instances parallel between 4 different languages on average.

167 papers2 benchmarksTexts

RLBench

RLBench is an ambitious large-scale benchmark and learning environment designed to facilitate research in a number of vision-guided manipulation research areas, including: reinforcement learning, imitation learning, multi-task learning, geometric computer vision, and in particular, few-shot learning.

167 papers8 benchmarksEnvironment

COD10K (Camouflaged/Concealed Object Detection)

Sensory ecologists have found that this s background matching camouflage strategy works by deceiving the visual perceptual system of the observer. Naturally, addressing concealed object detection (COD) requires a significant amount of visual perception knowledge. Understanding COD has not only scientific value in itself, but it also important for applications in many fundamental fields, such as computer vision (e.g., for search-and-rescue work, or rare species discovery), medicine (e.g., polyp segmentation, lung infection segmentation), agriculture (e.g., locust detection to prevent invasion), and art (e.g., recreational art). The high intrinsic similarities between the targets and non-targets make COD far more challenging than traditional object segmentation/detection. Although it has gained increased attention recently, studies on COD still remain scarce, mainly due to the lack of a sufficiently large dataset and a standard benchmark like Pascal-VOC, ImageNet, MS-COCO, ADE20K, and DA

166 papers28 benchmarksImages

FDDB (Face Detection Dataset and Benchmark)

The Face Detection Dataset and Benchmark (FDDB) dataset is a collection of labeled faces from Faces in the Wild dataset. It contains a total of 5171 face annotations, where images are also of various resolution, e.g. 363x450 and 229x410. The dataset incorporates a range of challenges, including difficult pose angles, out-of-focus faces and low resolution. Both greyscale and color images are included.

165 papers12 benchmarksImages

HELM (Holistic Evaluation of Language Models)

The Holistic Evaluation of Language Models (HELM) is a comprehensive framework developed by Stanford University for evaluating foundation language models. It serves as a living benchmark, promoting transparency in language models. Here are the key aspects of HELM:

165 papers0 benchmarks

YCB-Video

The YCB-Video dataset is a large-scale video dataset for 6D object pose estimation. provides accurate 6D poses of 21 objects from the YCB dataset observed in 92 videos with 133,827 frames.

164 papers30 benchmarksImages, RGB-D, Videos

Sports-1M

The Sports-1M dataset consists of over a million videos from YouTube. The videos in the dataset can be obtained through the YouTube URL specified by the authors. Approximately 7% (as of 2016) of the videos have been removed by the YouTube uploaders since the dataset was compiled. However, there are still over a million videos in the dataset with 487 sports-related categories with 1,000 to 3,000 videos per category. The videos are automatically labelled with 487 sports classes using the YouTube Topics API by analyzing the text metadata associated with the videos (e.g. tags, descriptions). Approximately 5% of the videos are annotated with more than one class.

164 papers10 benchmarksVideos

CelebAMask-HQ

CelebAMask-HQ is a large-scale face image dataset that has 30,000 high-resolution face images selected from the CelebA dataset by following CelebA-HQ. Each image has segmentation mask of facial attributes corresponding to CelebA.

164 papers14 benchmarksImages

SWAG (Situations With Adversarial Generations)

Given a partial description like "she opened the hood of the car," humans can reason about the situation and anticipate what might come next ("then, she examined the engine"). SWAG (Situations With Adversarial Generations) is a large-scale dataset for this task of grounded commonsense inference, unifying natural language inference and physically grounded reasoning.

163 papers3 benchmarksTexts

IJB-B (IARPA Janus Benchmark-B)

The IJB-B dataset is a template-based face dataset that contains 1845 subjects with 11,754 images, 55,025 frames and 7,011 videos where a template consists of a varying number of still images and video frames from different sources. These images and videos are collected from the Internet and are totally unconstrained, with large variations in pose, illumination, image quality etc. In addition, the dataset comes with protocols for 1-to-1 template-based face verification, 1-to-N template-based open-set face identification, and 1-to-N open-set video face identification.

163 papers75 benchmarksImages, Videos

CORD-19

CORD-19 is a free resource of tens of thousands of scholarly articles about COVID-19, SARS-CoV-2, and related coronaviruses for use by the global research community.

163 papers3 benchmarksMedical, Texts

YouTube-VIS 2019

YouTubeVIS is a new dataset tailored for tasks like simultaneous detection, segmentation and tracking of object instances in videos and is collected based on the current largest video object segmentation dataset YouTubeVOS.

163 papers0 benchmarksVideos

MultiRC (Multi-Sentence Reading Comprehension)

MultiRC (Multi-Sentence Reading Comprehension) is a dataset of short paragraphs and multi-sentence questions, i.e., questions that can be answered by combining information from multiple sentences of the paragraph. The dataset was designed with three key challenges in mind: * The number of correct answer-options for each question is not pre-specified. This removes the over-reliance on answer-options and forces them to decide on the correctness of each candidate answer independently of others. In other words, the task is not to simply identify the best answer-option, but to evaluate the correctness of each answer-option individually. * The correct answer(s) is not required to be a span in the text. * The paragraphs in the dataset have diverse provenance by being extracted from 7 different domains such as news, fiction, historical text etc., and hence are expected to be more diverse in their contents as compared to single-domain datasets. The entire corpus consists of around 10K questions

162 papers2 benchmarksTexts

EPIC-KITCHENS-100

This paper introduces the pipeline to scale the largest dataset in egocentric vision EPIC-KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M frames, 90K actions in 700 variable-length videos, capturing long-term unscripted activities in 45 environments, using head-mounted cameras. Compared to its previous version (EPIC-KITCHENS-55), EPIC-KITCHENS-100 has been annotated using a novel pipeline that allows denser (54% more actions per minute) and more complete annotations of fine-grained actions (+128% more action segments). This collection also enables evaluating the "test of time" - i.e. whether models trained on data collected in 2018 can generalise to new footage collected under the same hypotheses albeit "two years on". The dataset is aligned with 6 challenges: action recognition (full and weak supervision), action detection, action anticipation, cross-modal retrieval (from captions), as well as unsupervised domain adaptation for action recognition.

162 papers63 benchmarksTexts, Videos

SALICON (Salicency in Context)

The SALIency in CONtext (SALICON) dataset contains 10,000 training images, 5,000 validation images and 5,000 test images for saliency prediction. This dataset has been created by annotating saliency in images from MS COCO. The ground-truth saliency annotations include fixations generated from mouse trajectories. To improve the data quality, isolated fixations with low local density have been excluded. The training and validation sets, provided with ground truth, contain the following data fields: image, resolution and gaze. The testing data contains only the image and resolution fields.

161 papers21 benchmarksImages

CrowdHuman

CrowdHuman is a large and rich-annotated human detection dataset, which contains 15,000, 4,370 and 5,000 images collected from the Internet for training, validation and testing respectively. The number is more than 10× boosted compared with previous challenging pedestrian detection dataset like CityPersons. The total number of persons is also noticeably larger than the others with ∼340k person and ∼99k ignore region annotations in the CrowdHuman training subset.

161 papers10 benchmarksImages

Objects365

Objects365 is a large-scale object detection dataset, Objects365, which has 365 object categories over 600K training images. More than 10 million, high-quality bounding boxes are manually labeled through a three-step, carefully designed annotation pipeline. It is the largest object detection dataset (with full annotation) so far and establishes a more challenging benchmark for the community.

161 papers11 benchmarksImages

V-COCO (Verbs in COCO)

Verbs in COCO (V-COCO) is a dataset that builds off COCO for human-object interaction detection. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.

159 papers4 benchmarksImages

VisDial (Visual Dialog)

Visual Dialog (VisDial) dataset contains human annotated questions based on images of MS COCO dataset. This dataset was developed by pairing two subjects on Amazon Mechanical Turk to chat about an image. One person was assigned the job of a ‘questioner’ and the other person acted as an ‘answerer’. The questioner sees only the text description of an image (i.e., an image caption from MS COCO dataset) and the original image remains hidden to the questioner. Their task is to ask questions about this hidden image to “imagine the scene better”. The answerer sees the image, caption and answers the questions asked by the questioner. The two of them can continue the conversation by asking and answering questions for 10 rounds at max.

159 papers4 benchmarksDialog, Images, Texts

PAWS (Paraphrase Adversaries from Word Scrambling)

Paraphrase Adversaries from Word Scrambling (PAWS) is a dataset contains 108,463 human-labeled and 656k noisily labeled pairs that feature the importance of modeling structure, context, and word order information for the problem of paraphrase identification. The dataset has two subsets, one based on Wikipedia and the other one based on the Quora Question Pairs (QQP) dataset.

159 papers0 benchmarksTexts

PreviousPage 21 of 1000Next