3,275 machine learning datasets
3,275 dataset results
MMFlood is remote sensing dataset derived from Sentinel-1 (VV-VH), MapZen (DEM) and OpenStreetMap (Hydrography). It provides a complete and well-rounded set of data specifically designed for flood events, focusing on three main features: worldwide distribution, manually validated annotations and multiple modalities.
Large, multimodal biometric dataset: It contains still images and videos of over 1,000 people captured at various ranges (up to 1,000 meters) and elevations (up to 400 meters) using a diverse set of cameras (commercial, military-grade, specialized).
Manual crown delineation of individual trees in two countries: Denmark and Finland.
SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model
OoDIS is a benchmark dataset for anomaly instance segmentation, crucial for autonomous vehicle safety. It extends existing anomaly segmentation benchmarks to focus on the segmentation of individual out-of-distribution (OOD) objects.
The SUGARCREPE++ dataset evaluates the sensitivity of vision language models (VLMs) and unimodal language models (ULMs) to semantic and lexical alterations. Each sample in the SugarCrepe++ dataset consists of an image and a corresponding triplet of captions: a pair of semantically equivalent but lexically different positive captions and one hard negative caption. This poses a 3-way semantic (in)equivalence problem to the language models. The SUGARCREPE dataset consists of (only) one positive and one hard negative caption for each image. Relative to the negative caption, a single positive caption can either have low or high lexical overlap. The original SUGARCREPE only captures the high overlap case. To evaluate the sensitivity of encoded semantics to lexical alteration, we require an additional positive caption with a different lexical composition. SUGARCREPE++ fills this gap by adding an additional positive caption enabling a more thorough assessment of models’ abilities to handle se
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and \spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed are from the Lancaster-Oslo/Bergen Corpus of British English. It includes contributions from 657 writers making a total of 1,539 handwritten pages comprising of 115,320 words and is categorized as part of modern collection. The database is labeled at the sentence, line, and word levels.
We collect, organize and open-source the large-scale multimodal instruction dataset, Infinity-MM, consisting of tens of millions of samples. Through quality filtering and deduplication, the dataset has high quality and diversity. We propose a synthetic data generation method based on open-source models and labeling system, using detailed image annotations and diverse question generation.
The Helvipad dataset is a real-world stereo dataset designed for omnidirectional depth estimation. It comprises 39,553 paired equirectangular images captured using a top-bottom 360° camera setup and corresponding pixel-wise depth and disparity labels derived from LiDAR point clouds. The dataset spans diverse indoor and outdoor scenes under varying lighting conditions, including night-time environments.
A Dense-text Image Benchmark to evaluate large generation model's ability on text generation.
We introduce a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks, termed Open6DOR.
RefRef is a synthetic dataset and benchmark designed for the task of reconstructing scenes with complex refractive and reflective objects. Our dataset consists of 50 objects categorized based on their geometric and material complexity: single-material convex objects, single-material non-convex objects, and multi-material non-convex objects, where the materials have different colors, opacities, and refractive indices. Each object is placed in three distinct bounded environments and one unbounded environment, resulting in 150 unique scenes with diverse geometries, material properties, and backgrounds. Our dataset provides a controlled setting for evaluating and developing 3D reconstruction and novel view synthesis methods that handle complex optical effects.
VOT2019 is a Visual Object Tracking benchmark for short-term tracking in RGB.
The MLFP dataset consists of face presentation attacks captured with seven 3D latex masks and three 2D print attacks. The dataset contains videos captured from color, thermal and infrared channels.
The VIVA challenge’s dataset is a multimodal dynamic hand gesture dataset specifically designed with difficult settings of cluttered background, volatile illumination, and frequent occlusion for studying natural human activities in real-world driving settings. This dataset was captured using a Microsoft Kinect device, and contains 885 intensity and depth video sequences of 19 different dynamic hand gestures performed by 8 subjects inside a vehicle.
The Fraunhofer IPA Bin-Picking dataset is a large-scale dataset comprising both simulated and real-world scenes for various objects (potentially having symmetries) and is fully annotated with 6D poses. A pyhsics simulation is used to create scenes of many parts in bulk by dropping objects in a random position and orientation above a bin. Additionally, this dataset extends the Siléane dataset by providing more samples. This allows to e.g. train deep neural networks and benchmark the performance on the public Siléane dataset
The Middlebury 2001 is a stereo dataset of indoor scenes with multiple handcrafted layouts.
The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.