Datasets

19,997 machine learning datasets

19,997 dataset results

EgoGesture

The EgoGesture dataset contains 2,081 RGB-D videos, 24,161 gesture samples and 2,953,224 frames from 50 distinct subjects.

37 papers6 benchmarksImages, Videos

PeMS04

PeMS04 is a traffic forecasting benchmark.

37 papers6 benchmarksTime series

SQA (SequentialQA)

The SQA dataset was created to explore the task of answering sequences of inter-related questions on HTML tables. It has 6,066 sequences with 17,553 questions in total.

37 papers2 benchmarksTexts

WMT 2018 is a collection of datasets used in shared tasks of the Third Conference on Machine Translation. The conference builds on a series of twelve previous annual workshops and conferences on Statistical Machine Translation.

37 papers0 benchmarksTexts

NH-HAZE

NN-HAZE is an image dehazing dataset. Since in many real cases haze is not uniformly distributed NH-HAZE, a non-homogeneous realistic dataset with pairs of real hazy and corresponding haze-free images. This is the first non-homogeneous image dehazing dataset and contains 55 outdoor scenes. The non-homogeneous haze has been introduced in the scene using a professional haze generator that imitates the real conditions of hazy scenes.

37 papers2 benchmarks

A*3D

The A*3D dataset is a step forward to make autonomous driving safer for pedestrians and the public in the real world. Characteristics: * 230K human-labeled 3D object annotations in 39,179 LiDAR point cloud frames and corresponding frontal-facing RGB images. * Captured at different times (day, night) and weathers (sun, cloud, rain).

37 papers0 benchmarksImages

iHarmony4

iHarmony4 is a synthesized dataset for Image Harmonization. It contains 4 sub-datasets: HCOCO, HAdobe5k, HFlickr, and Hday2night (based on COCO, Adobe5k, Flickr, day2night datasets respectively), each of which contains synthesized composite images, foreground masks of composite images and corresponding real images.

37 papers3 benchmarks

Open Images V4

Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images) are provided. The images often show complex scenes with several objects (8 annotated objects per image on average). Visual relationships between them are annotated, which support visual relationship detection, an emerging task that requires structured reasoning.

37 papers1 benchmarksImages

HumanAct12

HumanAct12 is a new 3D human motion dataset adopted from the polar image and 3D pose dataset PHSPD, with proper temporal cropping and action annotating. Statistically, there are 1191 3D motion clips(and 90,099 poses in total) which are categorized into 12 action classes, and 34 fine-grained sub-classes. The action types includes daily actions such as walk, run, sit down, jump up, warm up, etc. Fine-grained action types contain more specific information like Warm up by bowing left side, Warm up by pressing left leg, etc.

37 papers20 benchmarksImages

HOC (Hallmarks of Cancer)

The Hallmarks of Cancer (*HOC) corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to the Hallmarks of Cancer taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus.

37 papers6 benchmarksTexts

SemEval-2018 Task-9

The SemEval-2018 hypernym discovery evaluation benchmark (Camacho-Collados et al. 2018) contains three domains (general, medical and music) and is also available in Italian and Spanish (not in this repository). For each domain a target corpus and vocabulary (i.e. hypernym search space) are provided. The dataset contains both concepts (e.g. dog) and entities (e.g. Manchester United) up to trigrams.

37 papers0 benchmarksTexts

PIPAL (Perceptual Image Processing ALgorithms IQA Dataset)

PIPAL training set contains 200 reference images, 40 distortion types, 23k distortion images, and more than one million human ratings. Especially, we include GAN-based algorithms’ outputs as a new GAN-based distortion type. We employ the Elo rating system to assign the Mean Opinion Scores (MOS).

37 papers0 benchmarksImages

PF-WILLOW

37 papers4 benchmarks

TextOCR

TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.

37 papers0 benchmarksImages, Texts

Learn2Reg

Learn2Reg is a dataset for medical image registration. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation.

37 papers0 benchmarksImages

genius

node classification on genius

37 papers2 benchmarksGraphs

Yelp Review Polarity

The Yelp Reviews Polarity dataset is obtained from the Yelp Dataset Challenge in 2015 (1,569,264 samples that have review text).

37 papers0 benchmarksTexts

CV-Bench (Cambrian Vision-Centric Benchmark)

The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.

37 papers0 benchmarksImages, Texts

RAP (Richly Annotated Pedestrian)

The Richly Annotated Pedestrian (RAP) dataset is a dataset for pedestrian attribute recognition. It contains 41,585 images collected from indoor surveillance cameras. Each image is annotated with 72 attributes, while only 51 binary attributes with the positive ratio above 1% are selected for evaluation. There are 33,268 images for the training set and 8,317 for testing.

36 papers2 benchmarksImages

METR-LA

METR-LA is a dataset for traffic prediction.

36 papers17 benchmarksTime series

PreviousPage 69 of 1000Next