192 machine learning datasets
192 dataset results
ChangeSim is a dataset aimed at online scene change detection (SCD) and more. The data is collected in photo-realistic simulation environments with the presence of environmental non-targeted variations, such as air turbidity and light condition changes, as well as targeted object changes in industrial indoor environments. By collecting data in simulations, multi-modal sensor data and precise ground truth labels are obtainable such as the RGB image, depth image, semantic segmentation, change segmentation, camera poses, and 3D reconstructions. While the previous online SCD datasets evaluate models given well-aligned image pairs, ChangeSim also provides raw unpaired sequences that present an opportunity to develop an online SCD model in an end-to-end manner, considering both pairing and detection. Experiments show that even the latest pair-based SCD models suffer from the bottleneck of the pairing process, and it gets worse when the environment contains the non-targeted variations.
NTU RGB+D 2D is a curated version of NTU RGB+D often used for skeleton-based action prediction and synthesis. It contains less number of actions.
TransCG is the first large-scale real-world dataset for transparent object depth completion and grasping, which contains 57,715 RGB-D images of 51 transparent objects and many opaque objects captured from different perspectives (~240 viewpoints) of 130 scenes under real-world settings. The samples are captured by two different types of cameras (Realsense D435 & L515).
The MMBody dataset provides human body data with motion capture, GT mesh, Kinect RGBD, and millimeter wave sensor data. See homepage for more details.
This dataset contains 70 (30 falls + 40 activities of daily living) sequences. Fall events are recorded with 2 Microsoft Kinect (RGB + Depth) cameras and corresponding accelerometric data. ADL events are recorded with only one camera and accelerometer. Sensor data was collected using PS Move (60Hz) and x-IMU (256Hz) devices.
The Freiburg Forest dataset was collected using a Viona autonomous mobile robot platform equipped with cameras for capturing multi-spectral and multi-modal images. The dataset may be used for evaluation of different perception algorithms for segmentation, detection, classification, etc. All scenes were recorded at 20 Hz with a camera resolution of 1024x768 pixels. The data was collected on three different days to have enough variability in lighting conditions as shadows and sun angles play a crucial role in the quality of acquired images. The robot traversed about 4.7 km each day. The dataset creators provide manually annotated pixel-wise ground truth segmentation masks for 6 classes: Obstacle, Trail, Sky, Grass, Vegetation, and Void.
A new dataset with significant occlusions related to object manipulation.
The Bimanual Actions Dataset is a collection of 540 RGB-D videos, showing subjects perform bimanual actions in a kitchen or workshop context. The main purpose for its compilation is to research bimanual human behaviour in order to eventually improve the capabilities of humanoid robots.
Generate high-quality 3D ground-truth shapes for reconstruction evaluation is extremely challenging because even 3D scanners can only generate pseudo ground-truth shapes with artefacts. We propose a novel data capturing and 3D annotation pipeline to obtain precise 3D ground-truth shapes without relying on expensive 3D scanners. The key to creating the precise 3D ground-truth shapes is using LEGO models, which are made of LEGO bricks with known geometry. The MobileBrick dataset provides a unique opportunity for future research on high-quality 3D reconstruction thanks to two distinctive features: 1) A large number of RGBD sequences with precise 3D ground-truth annotations. 2) The RGBD images were captured using mobile devices so algorithms can be tested in a realistic setup for mobile AR applications.
IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
A Large Dataset of Object Scans is a dataset of more than ten thousand 3D scans of real objects. To create the dataset, the authors recruited 70 operators, equipped them with consumer-grade mobile 3D scanning setups, and paid them to scan objects in their environments. The operators scanned objects of their choosing, outside the laboratory and without direct supervision by computer vision professionals. The result is a large and diverse collection of object scans: from shoes, mugs, and toys to grand pianos, construction vehicles, and large outdoor sculptures. The authors worked with an attorney to ensure that data acquisition did not violate privacy constraints. The acquired data was placed in the public domain and is available freely.
TICaM is a Time-of-flight In-car Cabin Monitoring dataset for vehicle interior monitoring using a single wide-angle depth camera. This dataset addresses the deficiencies of other available in-car cabin datasets in terms of the ambit of labeled classes, recorded scenarios and provided annotations; all at the same time. It consists of an exhaustive list of actions performed while driving and multi-modal labeled images (depth, RGB and IR), with complete annotations for 2D and 3D object detection, instance and semantic segmentation as well as activity annotations for RGB frames. Additional to real recordings, it also contains a synthetic dataset of in-car cabin images with same multi-modality of images and annotations, providing a unique and extremely beneficial combination of synthetic and real data for effectively training cabin monitoring systems and evaluating domain adaptation approaches.
COME15K is an RGB-D saliency detection dataset which contains 15,625 image pairs with high quality polygon-/scribble-/object-/instance-/rank-level annotations.
Endoscopic stereo reconstruction for surgical scenes gives rise to specific problems, including the lack of clear corner features, highly specular surface properties, and the presence of blood and smoke. These issues present difficulties for both stereo reconstruction itself and also for standardised dataset production. We present a stereo-endoscopic reconstruction validation dataset based on cone-beam CT (SERV-CT). Two ex vivo small porcine full torso cadavers were placed within the view of the endoscope with both the endoscope and target anatomy visible in the CT scan. Subsequent orientation of the endoscope was manually aligned to match the stereoscopic view and benchmark disparities, depths and occlusions are calculated. The requirement of a CT scan limited the number of stereo pairs to 8 from each ex vivo sample. For the second sample an RGB surface was acquired to aid alignment of smooth, featureless surfaces. Repeated manual alignments showed an RMS disparity accuracy of around
A large-scale multi-modal dataset to facilitate research and studies that concentrate on vision-wireless systems. The Vi-Fi dataset is a large-scale multi-modal dataset that consists of vision, wireless and smartphone motion sensor data of multiple participants and passer-by pedestrians in both indoor and outdoor scenarios. In Vi-Fi, vision modality includes RGB-D video from a mounted camera. Wireless modality comprises smartphone data from participants including WiFi FTM and IMU measurements.
The Industrial Metal Objects dataset is a diverse dataset of industrial metal objects. These objects are symmetric, textureless and highly reflective, leading to challenging conditions not captured in existing datasets. The dataset contains both real-world and synthetic multi-view RGB images with 6D object pose labels.
We present a large-scale dataset for 3D urban scene understanding. Compared to existing datasets, our dataset consists of 75 outdoor urban scenes with diverse backgrounds, encompassing over 15,000 images. These scenes offer 360◦ hemispherical views, capturing diverse foreground objects illuminated under various lighting conditions. Additionally, our dataset encompasses scenes that are not limited to forward-driving views, addressing the limitations of previous datasets such as limited overlap and coverage between camera views. The closest pre-existing dataset for generalizable evaluation is DTU [2] (80 scenes) which comprises mostly indoor objects and does not provide multiple foreground objects or background scenes.
Video sequences from a glasshouse environment in Campus Kleinaltendorf(CKA), University of Bonn, captured by PATHoBot, a glasshouse monitoring robot.
We introduce HARPER, a novel dataset for 3D body pose estimation and forecast in dyadic interactions between users and \spot, the quadruped robot manufactured by Boston Dynamics. The key-novelty is the focus on the robot's perspective, i.e., on the data captured by the robot's sensors. These make 3D body pose analysis challenging because being close to the ground captures humans only partially. The scenario underlying HARPER includes 15 actions, of which 10 involve physical contact between the robot and users. The Corpus contains not only the recordings of the built-in stereo cameras of Spot, but also those of a 6-camera OptiTrack system (all recordings are synchronized). This leads to ground-truth skeletal representations with a precision lower than a millimeter. In addition, the Corpus includes reproducible benchmarks on 3D Human Pose Estimation, Human Pose Forecasting, and Collision Prediction, all based on publicly available baseline approaches. This enables future HARPER users to
EDEN (Enclosed garDEN) is a multimodal synthetic dataset, a dataset for nature-oriented applications. The dataset features more than 300K images captured from more than 100 garden models. Each image is annotated with various low/high-level vision modalities, including semantic segmentation, depth, surface normals, intrinsic colors, and optical flow.