383 machine learning datasets
383 dataset results
Are current 3D object tracking methods truely robust enough for low-fidelity depth sensors like the iPhone LiDAR? We introduce DTTD-Mobile (fully compatible w/ YCB toolbox), a new benchmark built on real-world data captured from mobile devices; 18 objects observed in 100 videos with 47,668 sampled frames and 114,143 object annotations. We evaluate several popular methods—including BundleSDF, ES6D, MegaPose, and DenseFusion—and highlight their limitations in this challenging setting.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
MonoPerfCap is a benchmark dataset for human 3D performance capture from monocular video input consisting of around 40k frames, which covers a variety of different scenarios.
This is a 3D action recognition dataset, also known as 3D Action Pairs dataset. The actions in this dataset are selected in pairs such that the two actions of each pair are similar in motion (have similar trajectories) and shape (have similar objects); however, the motion-shape relation is different.
3D-FRONT (3D Furnished Rooms with layOuts and semaNTics) is large-scale, and comprehensive repository of synthetic indoor scenes highlighted by professionally designed layouts and a large number of rooms populated by high-quality textured 3D models with style compatibility. From layout semantics down to texture details of individual objects, the dataset is freely available to the academic community and beyond.
Understanding spatial relations (e.g., “laptop on table”) in visual input is important for both humans and robots. Existing datasets are insufficient as they lack largescale, high-quality 3D ground truth information, which is critical for learning spatial relations. In this paper, we fill this gap by constructing Rel3D: the first large-scale, human-annotated dataset for grounding spatial relations in 3D. Rel3D enables quantifying the effectiveness of 3D information in predicting spatial relations on large-scale human data. Moreover, we propose minimally contrastive data collection—a novel crowdsourcing method for reducing dataset bias. The 3D scenes in our dataset come in minimally contrastive pairs: two scenes in a pair are almost identical, but a spatial relation holds in one and fails in the other. We empirically validate that minimally contrastive examples can diagnose issues with current relation detection models as well as lead to sample-efficient training. Code and data are avai
The 2021 Kidney and Kidney Tumor Segmentation challenge (abbreviated KiTS21) is a competition in which teams compete to develop the best system for automatic semantic segmentation of renal tumors and surrounding anatomy.
Ubisoft La Forge Animation Dataset ("LAFAN1") Ubisoft La Forge Animation dataset and accompanying code for the SIGGRAPH 2020 paper Robust Motion In-betweening.
Interiorverse is a high-quality indoor scene dataset with rich details, including complex furniture and decorations and it is rendered with GGX BRDF model, which has stronger material modeling capability than any BRDF models.
FPv1 (prior name FAUST-partial) is a 3D registration benchmark dataset created to address the lack of data variability in the existing 3D registration benchmarks such as: 3DMatch, ETH, KITTI.
The MMBody dataset provides human body data with motion capture, GT mesh, Kinect RGBD, and millimeter wave sensor data. See homepage for more details.
Human Bodies in the Wild (HBW) is a validation and test set for body shape estimation. It consists of images taken in the wild and ground truth 3D body scans in SMPL-X topology. To create HBW, we collect body scans of 35 participants and register the SMPL-X model to the scans. Further each participant is photographed in various outfits and poses in front of a white background and uploads full-body photos of themselves taken in the wild. The validation and test set images are released. The ground truth shape is only released for the validation set.
The challenge of accurately segmenting individual trees from laser scanning data hinders the assessment of crucial tree parameters necessary for effective forest management, impacting many downstream applications. While dense laser scanning offers detailed 3D representations, automating the segmentation of trees and their structures from point clouds remains difficult. The lack of suitable benchmark datasets and reliance on small datasets have limited method development. The emergence of deep learning models exacerbates the need for standardized benchmarks. Addressing these gaps, the FOR-instance data represent a novel benchmarking dataset to enhance forest measurement using dense airborne laser scanning data, aiding researchers in advancing segmentation methods for forested 3D scenes.
The first real-world, large-scale Roadside Cooperative Perception Dataset, RCooper, is released to bloom research on roadside cooperative perception for practical applications. More than 50k images and 30k point clouds manually annotated with 3D bounding boxes and trajectories for ten semantic classes are provided.
AIOZ-GDANCE comprises 16.7 hours of whole-body motion and music audio of group dancing. The duration of each video in our dataset is ranging from 15 to 60 seconds.
Falling Things (FAT) is a dataset for advancing the state-of-the-art in object detection and 3D pose estimation in the context of robotics. It consists of generated photorealistic images with accurate 3D pose annotations for all objects in 60k images.
A new dataset with significant occlusions related to object manipulation.
VIPER is a benchmark suite for visual perception. The benchmark is based on more than 250K high-resolution video frames, all annotated with ground-truth data for both low-level and high-level vision tasks, including optical flow, semantic instance segmentation, object detection and tracking, object-level 3D scene layout, and visual odometry. Ground-truth data for all tasks is available for every frame. The data was collected while driving, riding, and walking a total of 184 kilometers in diverse ambient conditions in a realistic virtual world.
SPACE is a simulator for physical Interactions and causal learning in 3D environments. The SPACE simulator is used to generate the SPACE dataset, a synthetic video dataset in a 3D environment, to systematically evaluate physics-based models on a range of physical causal reasoning tasks. Inspired by daily object interactions, the SPACE dataset comprises videos depicting three types of physical events: containment, stability and contact.
Tasks. In moving object segmentation of point cloud sequences, one has to provide motion labels for each point of the test sequences 11-21. Therefore, the input to all evaluated methods is a list of coordinates of the three-dimensional points along with their remission, i.e., the strength of the reflected laser beam which depends on the properties of the surface that was hit. Each method should then output a label for each point of a scan, i.e., one full turn of the rotating LiDAR sensor. Here, we only distinguish between static and moving object classes.