Datasets

135 machine learning datasets

135 dataset results

WOD-C

🤖 Robo3D - The WOD-C Benchmark WOD-C is an evaluation benchmark heading toward robust and reliable 3D perception in autonomous driving. With it, we probe the robustness of 3D detectors and segmentors under out-of-distribution (OoD) scenarios against corruptions that occur in the real-world environment. Specifically, we consider natural corruptions happen in the following cases:

5 papers3 benchmarksPoint cloud

Sydney Urban Objects

This dataset contains a variety of common urban road objects scanned with a Velodyne HDL-64E LIDAR, collected in the CBD of Sydney, Australia. There are 631 individual scans of objects across classes of vehicles, pedestrians, signs and trees.

4 papers3 benchmarks3D, LiDAR, Point cloud

K-Lane (KAIST-Lane)

KAIST-Lane (K-Lane) is the world’s first and the largest public urban road and highway lane dataset for Lidar. K-Lane has more than 15K frames and contains annotations of up to six lanes under various road and traffic conditions, e.g., occluded roads of multiple occlusion levels, roads at day and night times, merging (converging and diverging) and curved lanes.

4 papers2 benchmarksImages, Point cloud

S3E

S3E is a novel large-scale multimodal dataset captured by a fleet of unmanned ground vehicles along four designed collaborative trajectory paradigms. S3E consists of 7 outdoor and 5 indoor scenes that each exceed 200 seconds, consisting of well synchronized and calibrated high-quality stereo camera, LiDAR, and high-frequency IMU data.

4 papers0 benchmarksImages, LiDAR, Point cloud

DHB Dataset (Dynamic Human Bodies Dataset)

Dynamic Human Bodies dataset (DHB), containing 10 point cloud sequences from the MITAMA dataset and 4 from the 8IVFB dataset. The sequences in DHB record 3D human motions with large and non-rigid deformation in real world. The overall dataset contains more than 3000 point cloud frames. And each frame has 1024 points.

4 papers2 benchmarksPoint cloud

Human-M3

Human-M3 is an outdoor multi-modal multi-view multi-person human pose database which includes not only multi-view RGB videos of outdoor scenes but also corresponding pointclouds.

4 papers0 benchmarksLiDAR, Point cloud

P2S (Points2Surf)

We introduced this dataset in Points2Surf, a method that turns point clouds into meshes.

4 papers0 benchmarks3d meshes, Point cloud

3D MM-Vet

We established a 3D evaluation benchmark, 3D MM-Vet, to assess the 4-level capacity in embodied interaction scenarios, varying from basic perception to control statements generation.

4 papers1 benchmarks3D, Point cloud

DrivAerNet (A Parametric Car Dataset for Data-driven Aerodynamic Design and Graph-Based Drag Prediction)

DrivAerNet is a large-scale, high-fidelity CFD dataset of 3D industry-standard car shapes designed for data-driven aerodynamic design. It comprises 4000 high-quality 3D car meshes and their corresponding aerodynamic performance coefficients, alongside full 3D flow field information.

4 papers1 benchmarks3D, 3d meshes, Physics, Point cloud, Tabular

MUSES: MUlti-SEnsor Semantic perception dataset (The Multi-Sensor Semantic Perception Dataset for Driving under Uncertainty)

MUSES offers 2500 multi-modal scenes, evenly distributed across various combinations of weather conditions (clear, fog, rain, and snow) and types of illumination (daytime, nighttime). Each image includes high-quality 2D pixel-level panoptic annotations and class-level and novel instance-level uncertainty annotations. Further, each adverse-condition image has a corresponding image of the same scene taken under clear-weather, daytime conditions. The annotation process for MUSES utilizes all available sensor data, allowing the annotators to also reliably label degraded image regions that are still discernible in other modalities. This results in better pixel coverage in the annotations and creates a more challenging evaluation setup.

4 papers15 benchmarksImages, LiDAR, Point cloud, RGB-D

MultiScan

We introduce MultiScan, a scalable RGBD dataset construction pipeline leveraging commodity mobile devices to scan indoor scenes with articulated objects and web-based semantic annotation interfaces to efficiently annotate object and part semantics and part mobility parameters. We use this pipeline to collect 273 scans of 117 indoor scenes containing 10957 objects and 5129 parts. The resulting MultiScan dataset provides RGBD streams with per-frame camera poses, textured 3D surface meshes, richly annotated part-level and object-level semantic labels, and part mobility parameters. We validate our dataset on instance segmentation and part mobility estimation tasks and benchmark methods for these tasks from prior work. Our experiments show that part segmentation and mobility estimation in real 3D scenes remain challenging despite recent progress in 3D object segmentation.

4 papers12 benchmarksImages, Point cloud

Ford Campus Vision and Lidar Data Set

Ford Campus Vision and Lidar Data Set is a dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system.

3 papers0 benchmarksLiDAR, Point cloud, Videos

MVHand

MVHand is a new multi-view hand posture dataset to obtain complete 3D point clouds of the hand in the real world.

3 papers0 benchmarks3D, Point cloud

OpenTrench3D

OpenTrench3D, the first publicly available point cloud dataset of underground utilities from open trenches. It features 310 fully annotated point clouds consisting of a total of 528 million points categorised into 5 unique classes. OpenTrench3D consists of photogrammetrically derived 3D point clouds capturing detailed scenes of open trenches, revealing underground utilities.

3 papers9 benchmarks3D, Point cloud

VBR (VBR: A Vision Benchmark in Rome)

This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi

3 papers0 benchmarks3D, LiDAR, Point cloud, RGB Video, Stereo, Tracking

ARCH2S (Dataset, Benchmark for Learning Exterior Architectural Structures from Point Clouds)

Precise segmentation of architectural structures provides detailed information about various building components, enhancing our understanding and interaction with our built environment. Nevertheless, existing outdoor 3D point cloud datasets have limited and detailed annotations on architectural exteriors due to privacy concerns and the expensive costs of data acquisition and annotation. To overcome this shortfall, this paper introduces a semantically-enriched, photo-realistic 3D architectural models dataset and benchmark for semantic segmentation. It features 4 different building purposes of real-world buildings as well as an open architectural landscape in Hong Kong. Each point cloud is annotated into one of 14 semantic classes.

3 papers2 benchmarks3D, Environment, Point cloud

MM-OR

Operating rooms (ORs) are complex, high-stakes environments requiring precise understanding of interactions among medical staff, tools, and equipment for enhancing surgical assistance, situational awareness, and patient safety. Current datasets fall short in scale, realism and do not capture the multimodal nature of OR scenes, limiting progress in OR modeling. To this end, we introduce MM-OR, a realistic and large-scale multimodal spatiotemporal OR dataset, and the first dataset to enable multimodal scene graph generation. MM-OR captures comprehensive OR scenes containing RGB-D data, detail views, audio, speech transcripts, robotic logs, and tracking data and is annotated with panoptic segmentations, semantic scene graphs, and downstream task labels. Further, we propose MM2SG, the first multimodal large vision-language model for scene graph generation, and through extensive experiments, demonstrate its ability to effectively leverage multimodal inputs. Together, MM-OR and MM2SG establi

3 papers7 benchmarks3D, Audio, Graphs, Images, Medical, Point cloud, RGB-D, Speech, Texts, Time series, Videos

Freiburg Spatial Relations

The Freiburg Spatial Relations dataset features 546 scenes each containing two out of 25 household objects. The depicted spatial relations can roughly be described as on top, on top on the corner, inside, inside and inclined, next to, and inclined. The dataset contains the 25 object models as textured .obj and .dae files, a low resolution .dae version for visualization in rviz, a scene description file containing the translation and rotation of the objects for each scene, a file with labels for each scene, the 15 splits used for cross validation, and a bash script to convert the models to pointclouds.

2 papers0 benchmarksPoint cloud

Pano3D

Pano3D is a new benchmark for depth estimation from spherical panoramas. Its goal is to drive progress for this task in a consistent and holistic manner. The Pano3D 360 depth estimation benchmark provides a standard Matterport3D train and test split, as well as a secondary GibsonV2 partioning for testing and training as well. The latter is used for zero-shot cross dataset transfer performance assessment and decomposes it into 3 different splits, each one focusing on a specific generalization axis.

2 papers0 benchmarks3D, Images, Point cloud, RGB-D

IBISCape

A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments.

2 papers0 benchmarksEnvironment, Images, Point cloud, RGB Video, RGB-D, Stereo, Videos

PreviousPage 4 of 7Next