Datasets

44 machine learning datasets

44 dataset results

VA (Virtual Apartment)

A synthetic depth estimation dataset for benchmark rendered from a high-quality CAD indoor environment

VBR (VBR: A Vision Benchmark in Rome)

This dataset presents a vision and perception research dataset collected in Rome, featuring RGB data, 3D point clouds, IMU, and GPS data. We introduce a new benchmark targeting visual odometry and SLAM, to advance the research in autonomous robotics and computer vision. This work complements existing datasets by simultaneously addressing several issues, such as environment diversity, motion patterns, and sensor frequency. It uses up-to-date devices and presents effective procedures to accurately calibrate the intrinsic and extrinsic of the sensors while addressing temporal synchronization. During recording, we cover multi-floor buildings, gardens, urban and highway scenarios. Combining handheld and car-based data collections, our setup can simulate any robot (quadrupeds, quadrotors, autonomous vehicles). The dataset includes an accurate 6-dof ground truth based on a novel methodology that refines the RTK-GPS estimate with LiDAR point clouds through Bundle Adjustment. All sequences divi

3 papers0 benchmarks3D, LiDAR, Point cloud, RGB Video, Stereo, Tracking

Middlebury MVS

Middlebury MVS is the earliest MVS dataset for multi-view stereo network evaluation. It contains two indoor objects with low-resolution (640 × 480) images and calibrated cameras.

2 papers0 benchmarksStereo

Middlebury 2003

Middlebury 2003 is a stereo dataset for indoor scenes.

2 papers0 benchmarksImages, Stereo

IBISCape

A Simulated Benchmark for multi-modal SLAM Systems Evaluation in Large-scale Dynamic Environments.

2 papers0 benchmarksEnvironment, Images, Point cloud, RGB Video, RGB-D, Stereo, Videos

Lindenthal Camera Traps

This data set contains 775 video sequences, captured in the wildlife park Lindenthal (Cologne, Germany) as part of the AMMOD project, using an Intel RealSense D435 stereo camera. In addition to color and infrared images, the D435 is able to infer the distance (or “depth”) to objects in the scene using stereo vision. Observed animals include various birds (at daytime) and mammals such as deer, goats, sheep, donkeys, and foxes (primarily at nighttime). A subset of 412 images is annotated with a total of 1038 individual animal annotations, including instance masks, bounding boxes, class labels, and corresponding track IDs to identify the same individual over the entire video.

2 papers0 benchmarksImages, RGB-D, Stereo, Videos

BASEPROD (The Bardenas Semi-Desert Planetary Rover Dataset)

BASEPROD provides comprehensive rover sensor data collected over a 1.7 km traverse, accompanied by high-resolution 2D and 3D drone maps of the terrain. The dataset also includes laser-induced breakdown spectroscopy (LIBS) measurements from key sampling sites along the rover's path, as well as weather station data to contextualize environmental conditions.

2 papers0 benchmarks3D, Environment, Images, Point cloud, RGB-D, Stereo, Tabular, Time series

CE4

Given the difficulty to handle planetary data we provide downloadable files in PNG format from the missions Chang'E-3 and Chang'E-4. In addition to a set of scripts to do the conversion given a different PDS4 Dataset.

1 papers0 benchmarksEnvironment, Images, Stereo

Real SVBRDF

A total of 80 real material samples were captured in a dark room. For each material, multiple captures were collected at different distances from the camera (between 250 and 650 mm) to observe both macro- and micro-level details. The dataset is mostly comprised of planar specimens but also includes non-planar objects such as mugs, globes, crumpled paper, etc. As shown above, it contains a rich diversity of materials, including diffuse or specular wrapping papers, fabrics, anisotropic metals, plastics, rugs, ceramic and wood flooring samples, etc. Each capture set includes 12 LDR (8 bpp) RGB-D images at 4K pixel resolution. Each set is captured at 50% and 100% of maximum light intensity. In total, we captured 462 such image sets (combinations of light intensities, distances to the camera, and material sample).

1 papers0 benchmarksImages, RGB-D, Stereo

THEOStereo

THEOStereo is a dataset providing synthetic stereo image pairs and their corresponding scene depth and will be published along with 1. All images follow the omnidirectional camera model. In total, there are 31,250 omnidirectional images pairs. The training set contains 25,000 image pairs. For validation and testing there are 3,125 image pairs, respectively. For each pair, there is a ground truth depth map describing the pixel-wise distance of the object along the left camera's z-axis. The virtual omnidirectional cameras exhibit a FOV of 180 degrees and can be described using Kannala's camera model 2. The distortion parameters are k_1 = 1 and k_2 = k_3 = k_4 = k_5 = 0. The length of the stereo camera's baseline was 0.3 AU (approx. 15 cm, not 30 cm!). Please do not forget to cite 1 if you use the dataset in your work. Thank you.

1 papers0 benchmarksRGB-D, Stereo

Bus Trajectory Dataset

This dataset contains the bus trajectory dataset collected by 6 volunteers who were asked to travel across the sub-urban city of Durgapur, India, on intra-city buses (route name: 54 Feet). During the travel, the volunteers captured sensor logs through an Android application installed on COTS smartphones.

1 papers0 benchmarksActions, Environment, Stereo

TERRA-REF (TERRA-REF, An open reference data set from high resolution genomics, phenomics, and imaging sensors)

The ARPA-E funded TERRA-REF project is generating open-access reference datasets for the study of plant sensing, genomics, and phenomics. Sensor data were generated by a field scanner sensing platform that captures color, thermal, hyperspectral, and active flourescence imagery as well as three dimensional structure and associated environmental measurements. This dataset is provided alongside data collected using traditional field methods in order to support calibration and validation of algorithms used to extract plot level phenotypes from these datasets.

1 papers0 benchmarks3D, Biology, Environment, Hyperspectral images, Point cloud, Stereo, Tabular, Time series

rc_49 (rc_49 Grasping Dataset)

Includes several sets of synthetic stereo images labelled with grasp rectangles representing parallel-jaw grasps (Cornell-like format).

1 papers0 benchmarksImages, RGB-D, Stereo

Mars Sample Localization

It contains grayscale mono and stereo images (NavCam and LocCam) from laboratory tests performed by a prototype rover on a martian-like testbed. The dataset can be used for artificial sample-tube detection and pose estimation. It also contains synthetic color images of the sample tube on a martian scenario created with Unreal Engine.

1 papers0 benchmarksImages, Stereo

BGG dataset (PUBG Gun Sound Dataset)

We recorded gun sounds by changing the type and position of guns to diversify distances and angles in the PUBG environment. The BGG dataset consists of 2,195 samples with 37 different types of guns and five directions, including a silence in which there is no gunfire, but noises exist. The distance from the firearms ranged from 0 meters to 600 meters. The audio was recorded in stereo (i.e., two-channel audio), and each sample contains various environmental noises (e.g., water splashing, walking, and bullet friction).

1 papers0 benchmarksAudio, Stereo

INSANE Cross-Domain UAV Data Set (Cross-Domain UAV Data Sets with Increased Number of Sensors for developing Advanced and Novel Estimators)

This data set contains over 600GB of multimodal data from a Mars analog mission, including accurate 6DoF outdoor ground truth, indoor-outdoor transitions with continuous cross-domain ground truth, and indoor data with Optitrack measurements as ground truth. With 26 flights and a combined distance of 2.5km, this data set provides you with various distinct challenges for testing and proofing your algorithms. The UAV carries 18 sensors, including a high-resolution navigation camera and a stereo camera with an overlapping field of view, two RTK GNSS sensors with centimeter accuracy, as well as three IMUs, placed at strategic locations: Hardware dampened at the center, off-center with a lever arm, and a 1kHz IMU rigidly attached to the UAV (in case you want to work with unfiltered data). The sensors are fully pre-calibrated, and the data set is ready to use. However, if you want to use your own calibration algorithms, then the raw calibration data is also ready for download. The cross-domai

1 papers0 benchmarksEnvironment, Images, Stereo, Tracking

ConsInv Dataset

ConsInv is a stereo RGB + IMU dataset designed for Dynamic SLAM testing and contains two subsets:

1 papers0 benchmarksImages, RGB Video, Stereo

Real-World Stereo Color and Sharpness Mismatch Dataset

A real-world stereo video dataset, containing 1200 frame pairs with real-world color and sharpness mismatches caused by beam splitter.

1 papers0 benchmarksStereo, Videos

L1BSR (L1BSR dataset)

The Sentinel-2 satellite carries 12 CMOS detectors for the VNIR bands, with adjacent detectors having overlapping fields of view that result in overlapping regions in level-1 B (L1B) images. This dataset includes 3740 pairs of overlapping image crops extracted from two L1B products. Each crop has a height of around 400 pixels and a variable width that depends on the overlap width between detectors for RGBN bands, typically around 120-200 pixels. In addition to detector parallax, there is also cross-band parallax for each detector, resulting in shifts between bands. Pre-registration is performed for both cross-band and cross-detector parallax, with a precision of up to a few pixels (typically less than 10 pixels).

1 papers0 benchmarksImages, Stereo

InfraParis

InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

1 papers0 benchmarksImages, RGB Video, RGB-D, Stereo

PreviousPage 2 of 3Next