Datasets

192 machine learning datasets

192 dataset results

rc_49 (rc_49 Grasping Dataset)

Includes several sets of synthetic stereo images labelled with grasp rectangles representing parallel-jaw grasps (Cornell-like format).

1 papers0 benchmarksImages, RGB-D, Stereo

PLAD (Point Line and Depth dataset)

PLAD is a dataset where sparse depth is provided by line-based visual SLAM to verify StructMDC.

1 papers2 benchmarks3d meshes, Images, RGB-D

Deep Indices (multi-spectral leaf/vegetation segmentation)

This dataset inclue multi-spectral acquisition of vegetation for the conception of new DeepIndices. The images were acquired with the Airphen (Hyphen, Avignon, France) six-band multi-spectral camera configured using the 450/570/675/710/730/850 nm bands with a 10 nm FWHM. The dataset were acquired on the site of INRAe in Montoldre (Allier, France, at 46°20'30.3"N 3°26'03.6"E) within the framework of the “RoSE challenge” founded by the French National Research Agency (ANR) and in Dijon (Burgundy, France, at 47°18'32.5"N 5°04'01.8"E) within the site of AgroSup Dijon. Images of bean and corn, containing various natural weeds (yarrows, amaranth, geranium, plantago, etc) and sowed ones (mustards, goosefoots, mayweed and ryegrass) with very distinct characteristics in terms of illumination (shadow, morning, evening, full sun, cloudy, rain, ...) were acquired in top-down view at 1.8 meter from the ground. (2020-05-01)

1 papers1 benchmarksEnvironment, Hyperspectral images, Images, RGB-D

Baxter-UR5_95-Objects

In this dataset two robots, Baxter and UR5, perform 8 behaviors (look, grasp, pick, hold, shake, lower, drop, and push) on 95 objects that vary by 5 color (blue, green, red, white, and yellow), 6 contents (wooden button, plastic dices, glass marbles, nuts & bolts, pasta, and rice), and 4 weights (empty, 50g, 100g, and 150g). There are 90 objects with contents (5 colors x 3 weights x 6 contents) and 5 objects without any content that only vary by 5 colors. Both robots perform 5 trials on each object, resulting in 7,600 interactions (2 robots x 8 behaviors x 95 objects x 5 trials

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

HOWS (HOWS-CL-25)

HOWS-CL-25 (Household Objects Within Simulation dataset for Continual Learning) is a synthetic dataset especially designed for object classification on mobile robots operating in a changing environment (like a household), where it is important to learn new, never seen objects on the fly. This dataset can also be used for other learning use-cases, like instance segmentation or depth estimation. Or where household objects or continual learning are of interest.

1 papers1 benchmarksImages, RGB-D

UMD-i Affrodance Dataset

One-Shot Affordance Part Segmentation variant of the UMD dataset. Each object instance in the dataset contains a single image.

1 papers0 benchmarksImages, RGB-D

RTB (Robot Tracking Benchmark)

The Robot Tracking Benchmark (RTB) is a synthetic dataset that facilitates the quantitative evaluation of 3D tracking algorithms for multi-body objects. It was created using the procedural rendering pipeline BlenderProc. The dataset contains photo-realistic sequences with HDRi lighting and physically-based materials. Perfect ground truth annotations for camera and robot trajectories are provided in the BOP format. Many physical effects, such as motion blur, rolling shutter, and camera shaking, are accurately modeled to reflect real-world conditions. For each frame, four depth qualities exist to simulate sensors with different characteristics. While the first quality provides perfect ground truth, the second considers measurements with the distance-dependent noise characteristics of the Azure Kinect time-of-flight sensor. Finally, for the third and fourth quality, two stereo RGB images with and without a pattern from a simulated dot projector were rendered. Depth images were then recons

1 papers2 benchmarks3D, 3d meshes, 6D, Images, RGB-D, Tracking, Videos

UR5 Tool Dataset

In this dataset UR5 robot used 6 tools: metal-scissor, metal-whisk, plastic-knife, plastic-spoon, wooden-chopstick, and wooden-fork to perform 6 behaviors: look, stirring-slow, stirring-fast, stirring-twist, whisk, and poke. The robot explored 15 objects: cane-sugar, chia-seed, chickpea, detergent, empty, glass-bead, kidney-bean, metal-nut-bolt, plastic-bead, salt, split-green-pea, styrofoam-bead, water, wheat, and wooden-button kept cylindrical containers. The robot performed 10 trials on each object using a tool, resulting in 5,400 interactions (6 tools x 6 behaviors x 15 objects x 10 trials). The robot records multiple sensory data (audio, RGB images, depth images, haptic, and touch images) while interacting with the objects.

1 papers0 benchmarksActions, Audio, Images, Interactive, RGB Video, RGB-D, Time series, Videos

6IMPOSE (Synthetic RGBD dataset for 6D pose estimation)

The dataset includes the synthetic data generated from rendering the 3D meshes of LM objects and several household objects in Blender for training 6D pose estimation algorithms. The whole dataset contains synthetic data for 18 objects (13 from LM and 5 from household objects), with 20,000 data samples for each object. Each data sample includes an RGB image in .png format and a depth image in .exr format. Each sample has the annotations of mask labels in .png format and the ground truth pose labels saved in .json files. Apart from the training data, the 3D meshes of the objects and the pre-trained models of the 6D pose estimation algorithm are also included. The whole dataset takes approximately ~1T of storage memory.

1 papers0 benchmarksRGB-D

VR-Folding

VR-Folding contains garment meshes of 4 categories from CLOTH3D dataset, namely Shirt, Pants, Top and Skirt. For flattening task, there are 5871 videos which contain 585K frames in total. For folding task, there are 3896 videos which contain 204K frames in total. The data for each frame include multi-view RGB-D images, object masks, full garment meshes, and hand poses.

1 papers0 benchmarksImages, RGB-D, Videos

ARKitTrack

ARKitTrack is a new RGB-D tracking dataset for both static and dynamic scenes captured by consumer-grade LiDAR scanners equipped on Apple's iPhone and iPad. ARKitTrack contains 300 RGBD sequences, 455 targets, and 229.7K video frames in total. This dataset has 123.9K pixel-level target masks along with the bounding box annotations and frame-level attributes.

1 papers0 benchmarksImages, RGB-D, Videos

NBMOD (Noisy Background Multi-Object Dataset for grasp detection)

Introduction NBMOD is a dataset created for researching the task of specific object grasp detection by robots in noisy environments. The dataset comprises three subsets: Simple background Single-object Subset (SSS), Noisy background Single-object Subset (NSS), and Multi-Object grasp detection Subset (MOS). The SSS subset contains 13,500 images, the NSS subset contains 13,000 images, and the MOS subset contains 5,000 images.

1 papers1 benchmarksRGB-D

UIUC Scooping Dataset (Granular Materials Manipulation Dataset with Scooping/Digging/Excavation Action)

Overview: This dataset encompasses a compilation of 6,700 executed scoops (excavations), mapped across a vast spectrum of materials, terrain topography, and compositions.

1 papers0 benchmarksEnvironment, Images, Point cloud, RGB-D, Time series

InfraParis

InfraParis is a novel and versatile dataset supporting multiple tasks across three modalities: RGB, depth, and infrared. From the city to the suburbs, it contains a variety of styles in different areas of the greater Paris area, providing rich semantic information. InfraParis contains 7301 images with bounding boxes and full semantic (19 classes) annotations. We assess various state-of-the-art baseline techniques, encompassing models for the tasks of semantic segmentation, object detection, and depth estimation.

1 papers0 benchmarksImages, RGB Video, RGB-D, Stereo

InHARD (Industrial Human Action Recognition Dataset in the Context of Industrial Collaborative Robotics)

We introduce a RGB+S dataset named “Industrial Human Action Recognition Dataset” (InHARD) from a real-world setting for industrial human action recognition with over 2 million frames, collected from 16 distinct subjects. This dataset contains 13 different industrial action classes and over 4800 action samples. The introduction of this dataset should allow us the study and development of various learning techniques for the task of human actions analysis inside industrial environments involving human robot collaborations.

1 papers0 benchmarksRGB-D, Videos

NPO (Negative and Positive Obstacles)

The dataset is recorded with an on-vehicle ZED stereo camera in both urban and rural environments

1 papers1 benchmarksRGB-D

HA-ViD (HA-ViD: A Human Assembly Video Dataset)

Understanding comprehensive assembly knowledge from videos is critical for futuristic ultra-intelligent industry. To enable technological breakthrough, we present HA-ViD – an assembly video dataset that features representative industrial assembly scenarios, natural procedural knowledge acquisition process, and consistent human-robot shared annotations. Specifically, HA-ViD captures diverse collaboration patterns of real-world assembly, natural human behaviors and learning progression during assembly, and granulate action annotations to subject, action verb, manipulated object, target object, and tool. We provide 3222 multi-view and multi-modality videos, 1.5M frames, 96K temporal labels and 2M spatial labels. We benchmark four foundational video understanding tasks: action recognition, action segmentation, object detection and multi-object tracking. Importantly, we analyze their performance and the further reasoning steps for comprehending knowledge in assembly progress, process effici

1 papers0 benchmarksActions, Images, RGB Video, RGB-D, Tracking, Videos

FDMSE-ISL

A large-scale isolated Indian sign language dataset. It contains 2002 common words, used in daily communications among Indian deaf community. The dataset contains 40033 videos across 2002 words. The total duration of the dataset is around 36.2 hours with 7.8 Million frames.

1 papers1 benchmarksRGB-D, Videos

MuSoHu (Toward human-like social robot navigation: A large-scale, multi-modal, social human navigation dataset)

A large-scale, egocentric, multimodal, and context-aware dataset of human demonstrations of social navigation.

1 papers0 benchmarks3D, Actions, LiDAR, Point cloud, RGB-D, Stereo, Videos

ViCoS Towel Dataset

The ViCoS Towel Dataset is a state-of-the-art benchmark for grasp point localization on cloth objects, specifically towels. Designed to advance research in robotic grasping and perception for textile objects, this dataset includes a collection of 8,000 high-resolution RGB-D images (1920×1080) captured with a Kinect V2 under a variety of conditions. Each image provides detailed depth information, making it ideal for training deep learning models and conducting thorough benchmarking.

1 papers3 benchmarksImages, RGB-D

PreviousPage 8 of 10Next