19,997 machine learning datasets
19,997 dataset results
The DIHARD II development and evaluation sets draw from a diverse set of sources exhibiting wide variation in recording equipment, recording environment, ambient noise, number of speakers, and speaker demographics. The development set includes reference diarization and speech segmentation and may be used for any purpose including system development or training.
iBims-1 (independent Benchmark images and matched scans - version 1) is a new high-quality RGB-D dataset, especially designed for testing single-image depth estimation (SIDE) methods. A customized acquisition setup, composed of a digital single-lens reflex (DSLR) camera and a high-precision laser scanner was used to acquire high-resolution images and highly accurate depth maps of diverse indoors scenarios.
CSL is a synthetic dataset introduced in Murphy et al. (2019) to test the expressivity of GNNs. In particular, graphs are isomorphic if they have the same degree and the task is to classify non-isomorphic graphs.
GrailQA is a new large-scale, high-quality dataset for question answering on knowledge bases (KBQA) on Freebase with 64,331 questions annotated with both answers and corresponding logical forms in different syntax (i.e., SPARQL, S-expression, etc.). It can be used to test three levels of generalization in KBQA: i.i.d., compositional, and zero-shot.
CLEAR is a continual image classification benchmark dataset with a natural temporal evolution of visual concepts in the real world that spans a decade (2004-2014). CLEAR is built from existing large-scale image collections (YFCC100M) through a novel and scalable low-cost approach to visio-linguistic dataset curation. The pipeline makes use of pretrained vision language models (e.g. CLIP) to interactively build labeled datasets, which are further validated with crowd-sourcing to remove errors and even inappropriate images (hidden in original YFCC100M). The major strength of CLEAR over prior CL benchmarks is the smooth temporal evolution of visual concepts with real-world imagery, including both high-quality labeled data along with abundant unlabeled samples per time period for continual semi-supervised learning.
We introduce an object detection dataset in challenging adverse weather conditions covering 12000 samples in real-world driving scenes and 1500 samples in controlled weather conditions within a fog chamber. The dataset includes different weather conditions like fog, snow, and rain and was acquired by over 10,000 km of driving in northern Europe. The driven route with cities along the road is shown on the right. In total, 100k Objekts were labeled with accurate 2D and 3D bounding boxes. The main contributions of this dataset are: - We provide a proving ground for a broad range of algorithms covering signal enhancement, domain adaptation, object detection, or multi-modal sensor fusion, focusing on the learning of robust redundancies between sensors, especially if they fail asymmetrically in different weather conditions. - The dataset was created with the initial intention to showcase methods, which learn of robust redundancies between the sensor and enable a raw data sensor fusion in cas
xP3 is a multilingual dataset for multitask prompted finetuning. It is a composite of supervised datasets in 46 languages with English and machine-translated prompts.
The Re-DocRED Dataset resolved the following problems of DocRED:
For the details of the work, the readers are refer to the paper "Feature Pyramid and Hierarchical Boosting Network for Pavement Crack Detection" (FPHB), T-ITS 2019. You can find the paper in https://www.researchgate.net/publication/330244656_Feature_Pyramid_and_Hierarchical_Boosting_Network_for_Pavement_Crack_Detection or https://arxiv.org/abs/1901.06340.
minesweeper is a synthetic graph emulating the eponymous game.
ToxicChat is a novel benchmark dataset constructed based on real user queries from an open-source chatbot. Unlike previous toxicity detection benchmarks that primarily rely on social media content, ToxicChat captures the rich and nuanced phenomena inherent in real-world user-AI interactions. This unique dataset reveals significant domain differences compared to social media contents, making it a valuable resource for exploring the challenges of toxicity detection in user-AI conversations¹.
This dataset focus on two blur types: camera motion blur and defocus blur. For each type of blur we synthesize $5$ scenes using Blender. We manually place multi-view cameras to mimic real data capture. To render images with camera motion blur, we randomly perturb the camera pose, and then linearly interpolate poses between the original and perturbed poses for each view. We render images from interpolated poses and blend them in linear RGB space to generate the final blurry images. For defocus blur, we use the built-in functionality to render depth-of-field images. We fix the aperture and randomly choose a focus plane between the nearest and furthest depth.
The TUD-L (TUD Light) dataset is part of the Benchmark for 6D Object Pose Estimation (BOP). Let me provide you with some details about this dataset:
The Crowd Instance-level Human Parsing (CIHP) dataset has 38,280 diverse human images. Each image in CIHP is labeled with pixel-wise annotations on 20 categories and instance-level identification. The dataset can be used for the human part segmentation task.
The Wide Multi Channel Presentation Attack (WMCA) database consists of 1941 short video recordings of both bonafide and presentation attacks from 72 different identities. The data is recorded from several channels including color, depth, infra-red, and thermal.
The JSB chorales are a set of short, four-voice pieces of music well-noted for their stylistic homogeneity. The chorales were originally composed by Johann Sebastian Bach in the 18th century. He wrote them by first taking pre-existing melodies from contemporary Lutheran hymns and then harmonising them to create the parts for the remaining three voices. The version of the dataset used canonically in representation learning contexts consists of 382 such chorales, with a train/validation/test split of 229, 76 and 77 samples respectively.
The Pinterest dataset contains more than 1 million images associated to Pinterest users’ who have “pinned” them.
The Completion3D benchmark is a dataset for evaluating state-of-the-art 3D Object Point Cloud Completion methods. Ggiven a partial 3D object point cloud the goal is to infer a complete 3D point cloud for the object.
A new multitask action quality assessment (AQA) dataset, the largest to date, comprising of more than 1600 diving samples; contains detailed annotations for fine-grained action recognition, commentary generation, and estimating the AQA score. Videos from multiple angles provided wherever available.
TallyQA is a large-scale dataset for open-ended counting.