Datasets

19,997 machine learning datasets

19,997 dataset results

ETHICS

A new benchmark that spans concepts in justice, well-being, duties, virtues, and commonsense morality.

SA-1B

SA-1B consists of 11M diverse, high resolution, licensed, and privacy protecting images and 1.1B high-quality segmentation masks.

183 papers4 benchmarksImages

OTB-2015, also referred as Visual Tracker Benchmark, is a visual tracking dataset. It contains 100 commonly used video sequences for evaluating visual tracking. Image Source: http://cvlab.hanyang.ac.kr/tracker_benchmark/datasets.html

182 papers4 benchmarksImages, Videos

MARS (Motion Analysis and Re-identification Set)

MARS (Motion Analysis and Re-identification Set) is a large scale video based person reidentification dataset, an extension of the Market-1501 dataset. It has been collected from six near-synchronized cameras. It consists of 1,261 different pedestrians, who are captured by at least 2 cameras. The variations in poses, colors and illuminations of pedestrians, as well as the poor image quality, make it very difficult to yield high matching accuracy. Moreover, the dataset contains 3,248 distractors in order to make it more realistic. Deformable Part Model and GMMCP tracker were used to automatically generate the tracklets (mostly 25-50 frames long).

181 papers6 benchmarksVideos

HICO-DET

HICO-DET is a dataset for detecting human-object interactions (HOI) in images. It contains 47,776 images (38,118 in train set and 9,658 in test set), 600 HOI categories constructed by 80 object categories and 117 verb classes. HICO-DET provides more than 150k annotated human-object pairs. V-COCO provides 10,346 images (2,533 for training, 2,867 for validating and 4,946 for testing) and 16,199 person instances. Each person has annotations for 29 action categories and there are no interaction labels including objects.

180 papers19 benchmarksImages

MORPH

MORPH is a facial age estimation dataset, which contains 55,134 facial images of 13,617 subjects ranging from 16 to 77 years old.

180 papers18 benchmarksImages

Hate Speech

Dataset of hate speech annotated on Internet forum posts in English at sentence-level. The source forum in Stormfront, a large online community of white nacionalists. A total of 10,568 sentence have been been extracted from Stormfront and classified as conveying hate speech or not.

180 papers0 benchmarksTexts

APPS (Automated Programming Progress Standard)

The APPS dataset consists of problems collected from different open-access coding websites such as Codeforces, Kattis, and more. The APPS benchmark attempts to mirror how humans programmers are evaluated by posing coding problems in unrestricted natural language and evaluating the correctness of solutions. The problems range in difficulty from introductory to collegiate competition level and measure coding ability as well as problem-solving.

180 papers13 benchmarksTexts

LRA (Long-Range Arena)

Long-range arena (LRA) is an effort toward systematic evaluation of efficient transformer models. The project aims at establishing benchmark tasks/datasets using which we can evaluate transformer-based models in a systematic way, by assessing their generalization power, computational efficiency, memory foot-print, etc. Long-Range Arena is specifically focused on evaluating model quality under long-context scenarios. The benchmark is a suite of tasks consisting of sequences ranging from 1K to 16K tokens, encompassing a wide range of data types and modalities such as text, natural, synthetic images, and mathematical expressions requiring similarity, structural, and visual-spatial reasoning.

180 papers7 benchmarksTexts

ShapeNetCore

ShapeNetCore is a subset of the full ShapeNet dataset with single clean 3D models and manually verified category and alignment annotations. It covers 55 common object categories with about 51,300 unique 3D models. The 12 object categories of PASCAL 3D+, a popular computer vision 3D benchmark dataset, are all covered by ShapeNetCore.

180 papers3 benchmarks3D, 3d meshes

IFEval (Instruction Following Evaluation Datset)

This dataset evaluates instruction following ability of large language models. There are 500+ prompts with instructions such as "write an article with more than 800 words", "wrap your response with double quotation marks", etc.

180 papers4 benchmarksTexts

VCR (Visual Commonsense Reasoning)

Visual Commonsense Reasoning (VCR) is a large-scale dataset for cognition-level visual understanding. Given a challenging question about an image, machines need to present two sub-tasks: answer correctly and provide a rationale justifying its answer. The VCR dataset contains over 212K (training), 26K (validation) and 25K (testing) questions, answers and rationales derived from 110K movie scenes.

179 papers1 benchmarksImages, Texts

Breakfast (The Breakfast Actions Dataset)

The Breakfast Actions Dataset comprises of 10 actions related to breakfast preparation, performed by 52 different individuals in 18 different kitchens. The dataset is one of the largest fully annotated datasets available. The actions are recorded “in the wild” as opposed to a single controlled lab environment. It consists of over 77 hours of video recordings.

179 papers26 benchmarksActions, Videos

WebVision

The WebVision dataset is designed to facilitate the research on learning visual representation from noisy web data. It is a large scale web images dataset that contains more than 2.4 million of images crawled from the Flickr website and Google Images search.

179 papers2 benchmarksImages

FUNSD (Form Understanding in Noisy Scanned Documents)

Form Understanding in Noisy Scanned Documents (FUNSD) comprises 199 real, fully annotated, scanned forms. The documents are noisy and vary widely in appearance, making form understanding (FoUn) a challenging task. The proposed dataset can be used for various tasks, including text detection, optical character recognition, spatial layout analysis, and entity labeling/linking.

179 papers3 benchmarksImages, Texts

Wilds

Builds on top of recent data collection efforts by domain experts in these applications and provides a unified collection of datasets with evaluation metrics and train/test splits that are representative of real-world distribution shifts.

179 papers0 benchmarks

QuAC (Question Answering in Context)

Question Answering in Context is a large-scale dataset that consists of around 14K crowdsourced Question Answering dialogs with 98K question-answer pairs in total. Data instances consist of an interactive dialog between two crowd workers: (1) a student who poses a sequence of freeform questions to learn as much as possible about a hidden Wikipedia text, and (2) a teacher who answers the questions by providing short excerpts (spans) from the text.

178 papers3 benchmarksTexts

LabelMe

LabelMe database is a large collection of images with ground truth labels for object detection and recognition. The annotations come from two different sources, including the LabelMe online annotation tool.

178 papers1 benchmarksImages

WMT 2016

WMT 2016 is a collection of datasets used in shared tasks of the First Conference on Machine Translation. The conference builds on ten previous Workshops on statistical Machine Translation.

178 papers0 benchmarksTexts

ARC (AI2 Reasoning Challenge)

The AI2’s Reasoning Challenge (ARC) dataset is a multiple-choice question-answering dataset, containing questions from science exams from grade 3 to grade 9. The dataset is split in two partitions: Easy and Challenge, where the latter partition contains the more difficult questions that require reasoning. Most of the questions have 4 answer choices, with <1% of all the questions having either 3 or 5 answer choices. ARC includes a supporting KB of 14.3M unstructured text passages.

178 papers1 benchmarksTexts

PreviousPage 19 of 1000Next