9 machine learning datasets
9 dataset results
Scan2CAD is an alignment dataset based on 1506 ScanNet scans with 97607 annotated keypoints pairs between 14225 (3049 unique) CAD models from ShapeNet and their counterpart objects in the scans. The top 3 annotated model classes are chairs, tables and cabinets which arises due to the nature of indoor scenes in ScanNet. The number of objects aligned per scene ranges from 1 to 40 with an average of 9.3.
BIKED is a dataset comprised of 4500 individually designed bicycle models sourced from hundreds of designers. BIKED enables a variety of data-driven design applications for bicycles and generally supports the development of data-driven design methods. The dataset is comprised of a variety of design information including assembly images, component images, numerical design parameters, and class labels.
FloorPlanCAD is a large-scale real-world CAD drawing dataset containing over 15,000 floor plans, ranging from residential to commercial buildings.
IndustReal is an ego-centric, multi-modal dataset where 27 participants are challenged to perform assembly and maintenance procedures on a construction-toy car. The dataset is annotated for action recognition, assembly state detection, and procedure step recognition. IndustReal includes 38 execution errors in a total of 84 videos, with 14 exclusive to validation and test sets and therefore suitable for testing the robustness of algorithms against unseen errors in procedural tasks. IndustReal offers open-source 3D models for all parts to promote the use of synthetic data for scalable approaches on this dataset, as well as reproducibility. All assembly parts used in the dataset are 3D printed. This ensures reproducibility and future availability of the model and allows for growth via community effort.
The dataset contains a Video capsule endoscopy dataset for polyp segmentation.
This dataset comprehends the 3D building information model (in IFC and Revit formats), manually elaborated based on the terrestrial laser scanner of the sequence 2 of ConSLAM, and the refined ground truth (GT) poses (in TUM format) of sessions 2, 3, 4, and 5 of the open-access ConSLAM dataset (which provides camera, LiDAR, and IMU measurements).
The buildingSMART Data Dictionary (bSDD) is an online service that hosts classifications and their properties, allowed values, units and translations. The bSDD allows linking between all the content inside the database. It provides a standardized workflow to guarantee data quality and information consistency.
📚 BlendNet The dataset contains $12k$ samples. To balance cost savings with data quality and scale, we manually annotated $2k$ samples and used GPT-4o to annotate the remaining $10k$ samples.
📚 CADBench CADBench is a comprehensive benchmark to evaluate the ability of LLMs to generate CAD scripts. It contains 500 simulated data samples and 200 data samples collected from online forums.