Datasets

3,275 machine learning datasets

3,275 dataset results

MMFlood

MMFlood is remote sensing dataset derived from Sentinel-1 (VV-VH), MapZen (DEM) and OpenStreetMap (Hydrography). It provides a complete and well-rounded set of data specifically designed for flood events, focusing on three main features: worldwide distribution, manually validated annotations and multiple modalities.

5 papers1 benchmarksImages

BTS3.1 (Expanding Accurate Person Recognition to New Altitudes and Ranges: The BRIAR Dataset)

Large, multimodal biometric dataset: It contains still images and videos of over 1,000 people captured at various ranges (up to 1,000 meters) and elevations (up to 400 meters) using a diverse set of cameras (commercial, military-grade, specialized).

5 papers7 benchmarksImages, Videos

25kTrees (Individual Tree Crown Annotations)

Manual crown delineation of individual trees in two countries: Denmark and Finland.

5 papers0 benchmarksImages

SkyEye-968k

SkyEyeGPT: Unifying Remote Sensing Vision-Language Tasks via Instruction Tuning with Large Language Model

5 papers0 benchmarksImages, Texts

OoDIS (Anomaly Instance Segmentation Benchmark)

OoDIS is a benchmark dataset for anomaly instance segmentation, crucial for autonomous vehicle safety. It extends existing anomaly segmentation benchmarks to focus on the segmentation of individual out-of-distribution (OOD) objects.

5 papers12 benchmarksImages

SugarCrepe++

The SUGARCREPE++ dataset evaluates the sensitivity of vision language models (VLMs) and unimodal language models (ULMs) to semantic and lexical alterations. Each sample in the SugarCrepe++ dataset consists of an image and a corresponding triplet of captions: a pair of semantically equivalent but lexically different positive captions and one hard negative caption. This poses a 3-way semantic (in)equivalence problem to the language models. The SUGARCREPE dataset consists of (only) one positive and one hard negative caption for each image. Relative to the negative caption, a single positive caption can either have low or high lexical overlap. The original SUGARCREPE only captures the high overlap case. To evaluate the sensitivity of encoded semantics to lexical alteration, we require an additional positive caption with a different lexical composition. SUGARCREPE++ fills this gap by adding an additional positive caption enabling a more thorough assessment of models’ abilities to handle se

5 papers0 benchmarksImages, Texts

HARPER (Exploring 3D Human Pose Estimation and Forecasting from the Robot’s Perspective: The HARPER Dataset)

We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and \spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to

5 papers18 benchmarks3D, Images, RGB-D, Videos

M3GIA

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

5 papers0 benchmarksImages, Texts

IAM(line-level) (Line-level Handwritten Text Recognition on IAM)

The IAM database contains 13,353 images of handwritten lines of text created by 657 writers. The texts those writers transcribed are from the Lancaster-Oslo/Bergen Corpus of British English. It includes contributions from 657 writers making a total of 1,539 handwritten pages comprising of 115,320 words and is categorized as part of modern collection. The database is labeled at the sentence, line, and word levels.

5 papers4 benchmarksImages, Texts

Infinity-MM

We collect, organize and open-source the large-scale multimodal instruction dataset, Infinity-MM, consisting of tens of millions of samples. Through quality filtering and deduplication, the dataset has high quality and diversity. We propose a synthetic data generation method based on open-source models and labeling system, using detailed image annotations and diverse question generation.

5 papers0 benchmarksImages, Texts, Videos

Helvipad

The Helvipad dataset is a real-world stereo dataset designed for omnidirectional depth estimation. It comprises 39,553 paired equirectangular images captured using a top-bottom 360° camera setup and corresponding pixel-wise depth and disparity labels derived from LiDAR point clouds. The dataset spans diverse indoor and outdoor scenes under varying lighting conditions, including night-time environments.

5 papers24 benchmarksImages

TextAtlasEval

A Dense-text Image Benchmark to evaluate large generation model's ability on text generation.

5 papers15 benchmarksImages, Texts

Open6DOR V2 (Benchmarking Open-instruction 6-DoF Object Rearrangement and A VLM-based Approach)

We introduce a challenging and comprehensive benchmark for open-instruction 6-DoF object rearrangement tasks, termed Open6DOR.

5 papers6 benchmarksImages, Texts

RefRef (RefRef: A Synthetic Dataset and Benchmark for Reconstructing Refractive and Reflective Objects)

RefRef is a synthetic dataset and benchmark designed for the task of reconstructing scenes with complex refractive and reflective objects. Our dataset consists of 50 objects categorized based on their geometric and material complexity: single-material convex objects, single-material non-convex objects, and multi-material non-convex objects, where the materials have different colors, opacities, and refractive indices. Each object is placed in three distinct bounded environments and one unbounded environment, resulting in 150 unique scenes with diverse geometries, material properties, and backgrounds. Our dataset provides a controlled setting for evaluating and developing 3D reconstruction and novel view synthesis methods that handle complex optical effects.

5 papers1 benchmarks3D, Images

VOT2019

VOT2019 is a Visual Object Tracking benchmark for short-term tracking in RGB.

4 papers4 benchmarksImages, Tracking, Videos

MLFP (Multispectral Latex Mask based Video Face Presentation Attack)

The MLFP dataset consists of face presentation attacks captured with seven 3D latex masks and three 2D print attacks. The dataset contains videos captured from color, thermal and infrared channels.

4 papers8 benchmarksImages

VIVA (Vision for Intelligent Vehicles and Applications)

The VIVA challenge’s dataset is a multimodal dynamic hand gesture dataset specifically designed with difficult settings of cluttered background, volatile illumination, and frequent occlusion for studying natural human activities in real-world driving settings. This dataset was captured using a Microsoft Kinect device, and contains 885 intensity and depth video sequences of 19 different dynamic hand gestures performed by 8 subjects inside a vehicle.

4 papers0 benchmarksImages

Fraunhofer IPA Bin-Picking

The Fraunhofer IPA Bin-Picking dataset is a large-scale dataset comprising both simulated and real-world scenes for various objects (potentially having symmetries) and is fully annotated with 6D poses. A pyhsics simulation is used to create scenes of many parts in bulk by dropping objects in a random position and orientation above a bin. Additionally, this dataset extends the Siléane dataset by providing more samples. This allows to e.g. train deep neural networks and benchmark the performance on the public Siléane dataset

4 papers0 benchmarks6D, Images

Middlebury 2001

The Middlebury 2001 is a stereo dataset of indoor scenes with multiple handcrafted layouts.

4 papers0 benchmarksImages, Stereo

DukeMTMC-attribute

The images in DukeMTMC-attribute dataset comes from Duke University. There are 1812 identities and 34183 annotated bounding boxes in the DukeMTMC-attribute dataset. This dataset contains 702 identities for training and 1110 identities for testing, corresponding to 16522 and 17661 images respectively. The attributes are annotated in the identity level, every image in this dataset is annotated with 23 attributes.

4 papers2 benchmarksImages, Texts, Videos

PreviousPage 72 of 164Next