383 machine learning datasets
383 dataset results
VasTexture is a free giant repository of textures and PBR materials extracted from real-world images. The repository contains 500,000 highly diverse textures and PBR materials. All assets are free to download and use. The PBR materials and textures were extracted from natural images using an unsupervised approach (no human intervention). As a result, the textures and PBR materials are significantly more diverse but also significantly less refined compared to assets made using manual and AI approaches.
A large-scale, egocentric, multimodal, and context-aware dataset of human demonstrations of social navigation.
Heritage Pointcloud Instance Collection dataset, acquired from two large buildings and annotated at a point-wise semantic level based on existent BIM models. Devid Campagnolo, Elena Camuffo, Umberto Michieli, Paolo Borin, Simone Milani and Andrea Giordano, "Fully Automated Scan-to-BIM via Point Cloud Instance Segmentation", In Proceedings of the International Conference on Image Processing (ICIP) 2023.
ConSLAM is a real-world dataset collected periodically on a construction site to measure the accuracy of mobile scanners' SLAM algorithms.
The RAD (Randomly Assembled Object Construction) dataset is a synthetic 3D LEGO dataset designed for the task of Sequential Brick Assembly (SBA). Here are the key characteristics and details:
A large-scale traffic sign and traffic light dataset with accurate 3D positioning and temporally consistent 3D bounding boxes of traffic management objects from up to 200 meters away. The dataset contains additional attributes such as traffic light state, traffic light mask type, traffic sign type, and occlusion. The application areas are 3D traffic lights and sign detection for autonomous driving.
This is the official dataset collected for to test the sim-to-real transfer. It contains 6 articulated object instances, each captured from 20 camera views under 5 states in scenarios with and without background, as well as presence or absence of distractors.
A small-scale, real-world Project Aria dataset with high quality static 3D oriented bounding boxs annotations.
The NuiSI dataset contains skeleton tracking trajectories of Human Interaction Partners performing a variety of physically interactive behaviors (waving, handshaking, rocket fistbump, parachute fistbump) with each other. This is inspired by the dataset in Bütepage et al. "Imitating by generating: Deep generative models for imitation of interactive tasks." Frontiers in Robotics and AI (2020) wherein they capture a dataset with rokoko motion capture suits. Instead we track the skeletons of the interaction partner with Intel Realsense cameras using Nuitrack, for a more realistic scenario, with noise coming from the depth sensor, the skeleton tracking and some partial occlusions. This makes it more representative of real world interactions with a Robot equipped with an RGBD camera. T This dataset is used in our papers for training Interaction models for Human-Robot Interaction with a humanoid social robot. If you find the dataset useful in your work, please cite our paper:
This dataset supports the research detailed in the pre-print "Virtual Imaging Trials Improved the Transparency and Reliability of AI Systems in COVID-19 Imaging." The study employs both clinical and simulated CT data to evaluate AI models for COVID-19 diagnosis. By leveraging the Virtual Imaging Trials (VIT) framework, the research addresses reproducibility and generalizability issues prevalent in medical imaging AI models.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
📚 BlendNet The dataset contains $12k$ samples. To balance cost savings with data quality and scale, we manually annotated $2k$ samples and used GPT-4o to annotate the remaining $10k$ samples.
📚 CADBench CADBench is a comprehensive benchmark to evaluate the ability of LLMs to generate CAD scripts. It contains 500 simulated data samples and 200 data samples collected from online forums.
Rethinking Abdominal Organ Segmentation (RAOS) in the clinical scenario: A robustness evaluation benchmark with challenging cases.
Late third instar wing imaginal discs were cultured in Shields and Sang M3 media (Sigma) supplemented with 2% FBS (Sigma), 1% pen/strep (Gibco), 3ng/ml ecdysone (Sigma) and 2ng/ml insulin (Sigma). Wing discs were cultured in 35mm fluorodishes (WPI) under 12mm filters (Millicell), as described in https://doi.org/10.1038%2Fs41567-019-0618-1
This dataset contains Material-Point-Method (MPM) simulations for various materials, including water, sand, plasticine, elasticity, jelly, rigid collisions, and melting. Each material is represented as point-clouds that evolve over time. The dataset is designed for learning and predicting MPM-based physical simulations.
This dataset contains Material-Point-Method (MPM) simulations for various materials, including water, sand, plasticine, jelly, and rigid collisions. Each material is represented as point-clouds that evolve over time. The dataset is designed for learning and predicting MPM-based physical simulations. Each material contains 50 trajectories with different initial velocity field.
MeshFleet is a filtered and annotated dataset of High Quality vehicles derived from Objaverse XL. It contains the sha256 of the objects together with consitent object captions and vehicle parameters.
Synthetic soccer players rendered on top of real world stadium images in 4K covering half a pitch each. Ground truth annotations in form of precise location of players on the pitch as well as 3D location of player pelvis and image bounding boxes.
Structured atmospheric data for AI/ML Long-term, pre-processed, atmospheric datasets for use in Machine Learning/AI based forecasting. Initially intended to predict AOD, however can be adapted for prediction of other atmospheric particles.