19,997 machine learning datasets
19,997 dataset results
The EgoGesture dataset contains 2,081 RGB-D videos, 24,161 gesture samples and 2,953,224 frames from 50 distinct subjects.
PeMS04 is a traffic forecasting benchmark.
The SQA dataset was created to explore the task of answering sequences of inter-related questions on HTML tables. It has 6,066 sequences with 17,553 questions in total.
WMT 2018 is a collection of datasets used in shared tasks of the Third Conference on Machine Translation. The conference builds on a series of twelve previous annual workshops and conferences on Statistical Machine Translation.
NN-HAZE is an image dehazing dataset. Since in many real cases haze is not uniformly distributed NH-HAZE, a non-homogeneous realistic dataset with pairs of real hazy and corresponding haze-free images. This is the first non-homogeneous image dehazing dataset and contains 55 outdoor scenes. The non-homogeneous haze has been introduced in the scene using a professional haze generator that imitates the real conditions of hazy scenes.
The A*3D dataset is a step forward to make autonomous driving safer for pedestrians and the public in the real world. Characteristics: * 230K human-labeled 3D object annotations in 39,179 LiDAR point cloud frames and corresponding frontal-facing RGB images. * Captured at different times (day, night) and weathers (sun, cloud, rain).
iHarmony4 is a synthesized dataset for Image Harmonization. It contains 4 sub-datasets: HCOCO, HAdobe5k, HFlickr, and Hday2night (based on COCO, Adobe5k, Flickr, day2night datasets respectively), each of which contains synthesized composite images, foreground masks of composite images and corresponding real images.
Open Images V4 offers large scale across several dimensions: 30.1M image-level labels for 19.8k concepts, 15.4M bounding boxes for 600 object classes, and 375k visual relationship annotations involving 57 classes. For object detection in particular, 15x more bounding boxes than the next largest datasets (15.4M boxes on 1.9M images) are provided. The images often show complex scenes with several objects (8 annotated objects per image on average). Visual relationships between them are annotated, which support visual relationship detection, an emerging task that requires structured reasoning.
HumanAct12 is a new 3D human motion dataset adopted from the polar image and 3D pose dataset PHSPD, with proper temporal cropping and action annotating. Statistically, there are 1191 3D motion clips(and 90,099 poses in total) which are categorized into 12 action classes, and 34 fine-grained sub-classes. The action types includes daily actions such as walk, run, sit down, jump up, warm up, etc. Fine-grained action types contain more specific information like Warm up by bowing left side, Warm up by pressing left leg, etc.
The Hallmarks of Cancer (*HOC) corpus consists of 1852 PubMed publication abstracts manually annotated by experts according to the Hallmarks of Cancer taxonomy. The taxonomy consists of 37 classes in a hierarchy. Zero or more class labels are assigned to each sentence in the corpus.
The SemEval-2018 hypernym discovery evaluation benchmark (Camacho-Collados et al. 2018) contains three domains (general, medical and music) and is also available in Italian and Spanish (not in this repository). For each domain a target corpus and vocabulary (i.e. hypernym search space) are provided. The dataset contains both concepts (e.g. dog) and entities (e.g. Manchester United) up to trigrams.
PIPAL training set contains 200 reference images, 40 distortion types, 23k distortion images, and more than one million human ratings. Especially, we include GAN-based algorithms’ outputs as a new GAN-based distortion type. We employ the Elo rating system to assign the Mean Opinion Scores (MOS).
TextOCR is a dataset to benchmark text recognition on arbitrary shaped scene-text. TextOCR requires models to perform text-recognition on arbitrary shaped scene-text present on natural images. TextOCR provides ~1M high quality word annotations on TextVQA images allowing application of end-to-end reasoning on downstream tasks such as visual question answering or image captioning.
Learn2Reg is a dataset for medical image registration. Learn2Reg covers a wide range of anatomies (brain, abdomen, and thorax), modalities (ultrasound, CT, MR), availability of annotations, as well as intra- and inter-patient registration evaluation.
node classification on genius
The Yelp Reviews Polarity dataset is obtained from the Yelp Dataset Challenge in 2015 (1,569,264 samples that have review text).
The Cambrian Vision-Centric Benchmark (CV-Bench) is designed to address the limitations of existing vision-centric benchmarks by providing a comprehensive evaluation framework for multimodal large language models (MLLMs). With 2,638 manually-inspected examples, CV-Bench significantly surpasses other vision-centric MLLM benchmarks, offering 3.5 times more examples than RealWorldQA and 8.8 times more than MMVP.
The Richly Annotated Pedestrian (RAP) dataset is a dataset for pedestrian attribute recognition. It contains 41,585 images collected from indoor surveillance cameras. Each image is annotated with 72 attributes, while only 51 binary attributes with the positive ratio above 1% are selected for evaluation. There are 33,268 images for the training set and 8,317 for testing.
METR-LA is a dataset for traffic prediction.