192 machine learning datasets
192 dataset results
The YCB-Ev dataset contains synchronized RGB-D frames and event data that enables evaluating 6DoF object pose estimation algorithms using these modalities. This dataset provides ground truth 6DoF object poses for the same 21 YCB objects that were used in the YCB-Video (YCB-V) dataset, allowing for cross-dataset algorithm performance evaluation. The dataset consists of 21 synchronized event and RGB-D sequences, totalling 13,851 frames (7 minutes and 43 seconds of event data). Notably, 12 of these sequences feature the same object arrangement as the YCB-V subset used in the BOP challenge.
Whole-body, low-level control/manipulation demonstration dataset for ManiSkill-HAB. Demonstrations are organized by task-subtask-object. All demos use RGBD (128x128) and state. JSON files store metadata (tincluding even labels and success/failure mode), while HDF5 files store demonstration data.
A RGB-D dataset converted from NYUDv2 into COCO-style instance segmentation format. To construct NYUDv2-IS, specifically tailored for instance segmentation, we generated instance masks that delineate individual objects in each image. These masks were labeled using the object class annotations provided in the original NYUDv2 dataset, which is distributed in MATLAB format. The process involved several key steps: (1) extracting binary instance masks, (2) converting these masks into polygon representations, and (3) generating COCO-style annotations. Each annotation includes essential attributes such as category ID, segmentation masks, bounding boxes, object areas, and image metadata. During this conversion, we focused on 9 categories out of the original 13 classes, excluding non-instance categories such as walls and floors. To ensure dataset quality, images without any object annotations were systematically removed.
A RGB-D dataset converted from SUN-RGBD into COCO-style instance segmentation format. To transform SUN-RGBD into an instance segmentation benchmark (i.e., SUN-RGBDIS), we employed a pipeline similar to that of NYUDv2-IS. We selected 17 categories from the original 37 classes, carefully omitting non-instance categories like ceilings and walls. Images lacking any identifiable object instances were filtered out to maintain dataset relevance for instance segmentation tasks. We systematically convert segmentation annotations into COCO format, generating precise bounding boxes, instance masks, and object attributes.
RGB-D instance segmentation box dataset. The Box-IS dataset was created to support research on human-robot collaboration with a focus on robotic manipulation tasks. It was captured using the Intel® RealSense™ Depth Camera D455, a high-performance sensor designed for depth imaging. To ensure precise depth measurements, we bypassed the default depth data processing of the sensor and performed accurate stereo matching directly from the captured left and right IR images. Employing the UniMatch technique, we derived a high-quality depth map from these stereo IR images, which was then aligned with the corresponding RGB image for a comprehensive output. The dataset was intentionally designed to encompass a broad range of scene complexities, from simple box arrangements to highly irregular configurations. This diversity ensures that it can effectively benchmark algorithms across varying levels of difficulty.
Click to add a brief description of the dataset (Markdown and LaTeX enabled).
The HRI Dataset comprises a total of 3,200 image pairs. Each image pair comprises a clean background image, a depth image, a rain layer mask image, and a rainy image. It contains three scenes: lane, citystreet and japanesestreet, with image resolutions of $2048\times1024$. The lane scene contains 1,600 image pairs, consisting of images from 4 camera viewpoints, each viewpoint containing 100 images of different moments, and each moment containing 4 different intensities of rainy scenes. The citystreet scene contains 600 image pairs, consisting of images from 6 camera viewpoints, each viewpoint containing 25 images of different moments, and each moment containing 4 different intensities of rainy scenes. The japanesestreet scene contains 1,000 image pairs, consisting of images from 10 camera viewpoints, each viewpoint containing 25 images of different moments, and each moment containing 4 different intensities of rainy scenes.
GraspClutter6D is a large-scale real-world dataset for robust object perception and robotic grasping in cluttered environments. It features 1,000 highly cluttered scenes with dense arrangements (average 14.1 objects/scene with 62.6% occlusion), 200 household, industrial, and warehouse objects captured in 75 diverse environment configurations (bins, shelves, and tables), multi-view data from 4 RGB-D cameras (RealSense D415, D435, Azure Kinect, and Zivid One+), and comprehensive annotations including 736K 6D object poses and 9.3 billion feasible robotic grasps for 52K RGB-D images. The dataset provides a challenging testbed for segmentation, 6D pose estimation, and grasp detection algorithms in realistic cluttered scenarios.
ML-ready Global Dataset of elevation map. Adapting Copernicus DEM GLO-30 to the Major TOM framework.
Daily Activity Recordings for Artificial Intelligence (DARai, pronounced "Dahr-ree") is a multimodal, hierarchically annotated dataset constructed to understand human activities in real-world settings. DARai consists of continuous scripted and unscripted recordings of 50 participants in 10 different environments, totaling over 200 hours of data from 20 sensors including multiple camera views, depth and radar sensors, wearable inertial measurement units (IMUs), electromyography (EMG), insole pressure sensors, biomonitor sensors, and gaze tracker. To capture the complexity in human activities, DARai is annotated at three levels of hierarchy: (i) high-level activities (L1) that are independent tasks, (ii) lower-level actions (L2) that are patterns shared between activities, and (iii) fine-grained procedures (L3) that detail the exact execution steps for actions. The dataset annotations and recordings are designed so that 22.7% of L2 actions are shared between L1 activities and 14.2% of L3
In this dataset we teleoperated UR5 arm to collect manipulation data for picking up a screwdriver in a cluttered tabletop environment.
The DAVIDE dataset consists of synchronized blurred, depth, and sharp videos. The dataset comprises 90 video sequences divided into 69 for training, 7 for validation, and 14 for testing. The test set includes annotations of seven content attributes categorized by: 1) environment (indoor/outdoor), 2) motion (camera motion/camera and object motion), and 3) scene proximity (close/mid/far). These annotations aim to facilitate further analysis into scenarios where depth information could be more beneficial.
The Matador dataset is a material image dataset with hierarchical labels. The hierarchical labels are derived from a new taxonomy. For each sample of a material, we collect a local appearance image, local surface structure LiDAR scan, global context image, and record any camera motion that takes place during the capture sequence. The dataset is intended to grow over time. To date, Matador contains 57 different material categories and a total of ~7,200 images, averaging 126 samples of intraclass variance.
To file a complaint with Expedia, the first step is to dial ((+1 (888) 829-0881)) (USA). Once you're connected to an Expedia human, explain any concerns related to refund fees, hotel cancellations, or cancellation rates. If you need help understanding the Expedia hotel refund policy, their team can provide clarity. For persistent issues, call ((+1 (888) 829-0881)) (USA) and escalate your complaint to a higher level. Expedia is committed to resolving your complaint, and you can reach them again at ((+1 (888) 829-0881)) (USA) to ensure your matter is taken care of.
The Freiburg Campus 3D Scan dataset consists of 3D area maps from the Freiburg campus that were scanned with 3D lasers. Areas include corridors, the outdoor campus, and some of the colleges and buildings.
Freiburg Lighting Adaptable Map Tracking is a dataset for camera trajectory estimation. The dataset consists of two subdatasets, each consisting of a Lighting Adaptable Map and three camera trajectories recorded under varying lighting conditions. The map meshes are stored in PLY format with custom properties and elements. The trajectories contain synchronized RGB-D images, exposure times and gains, ground-truth light settings and camera poses, as well as the camera tracking results presented in the paper.
The RGB-D Scenes Dataset v2 consists of 14 scenes containing furniture (chair, coffee table, sofa, table) and a subset of the objects in the RGB-D Object Dataset (bowls, caps, cereal boxes, coffee mugs, and soda cans). Each scene is a point cloud created by aligning a set of video frames using Patch Volumes Mapping.
The RGB-D Scenes Dataset contains 8 scenes annotated with objects that belong to the Washington RGB-D Object Dataset. Each scene is a single video sequence consisting of multiple RGB-D frames.
The Freiburg RGB-D People dataset contains 3000+ RGB-D frames acquired in a university hall from three vertically mounted Kinect sensors. The data contains mostly upright walking and standing persons seen from different orientations and with different levels of occlusions.
PAVIS RGB-D is a dataset for person re-identification using depth information. The main motivation is that techniques such as SDALF fail when the individuals change their clothing, therefore they cannot be used for long-term video surveillance. Depth information is the solution to deal with this problem because it stays constant for a longer period of time. The dataset is composed by four different groups of data collected using the Kinect. The first group of data has been obtained by recording 79 people with a frontal view, walking slowly, avoiding occlusions and with stretched arms ("Collaborative"). This happened in an indoor scenario, where the people were at least 2 meters away from the camera. The second ("Walking1") and third ("Walking2") groups of data are composed by frontal recordings of the same 79 people walking normally while entering the lab where they normally work. The fourth group ("Backwards") is a back view recording of the people walking away from the lab. The data