Papers With Code 2 | ML Benchmarks, SotA Results & Code

DeepLocCross

DeepLocCross is a localization dataset that contains RGB-D stereo images captured at 1280 x 720 pixels at a rate of 20 Hz. The ground-truth pose labels are generated using a LiDAR-based SLAM system. In addition to the 6-DoF localization poses of the robot, the dataset additionally contains tracked detections of the observable dynamic objects. Each tracked object is identified using a unique track ID, spatial coordinates, velocity and orientation angle. Furthermore, as the dataset contains multiple pedestrian crossings, labels at each intersection indicating its safety for crossing are provided. This dataset consists of seven training sequences with a total of 2264 images, and three testing sequences with a total of 930 images. The dynamic nature of the surrounding environment at which the dataset was captured renders the tasks of localization and visual odometry estimation extremely challenging due to the varying weather conditions, presence of shadows and motion blur caused by the mov

1 papers0 benchmarksImages, RGB-D

RMRC 2014

The RMRC 2014 indoor dataset is a dataset for indoor semantic segmentation. It employs the NYU Depth V2 and Sun3D datasets to define the training set. The test data consists of newly acquired images.

1 papers0 benchmarksImages, RGB-D

SPHERE-calorie

The dataset contains both RGB and depth images, and the data from two accelerometers, together with ground truth calorie values from a calorimeter for calorie expenditure estimation in home environments.

1 papers0 benchmarksImages, RGB-D, Time series

CLAD (Complex and Long Activities Dataset)

CLAD (Compled and Long Activities Dataset) is an activity dataset which exhibits real-life and diverse scenarios of complex, temporally-extended human activities and actions. The dataset consists of a set of videos of actors performing everyday activities in a natural and unscripted manner. The dataset was recorded using a static Kinect 2 sensor which is commonly used on many robotic platforms. The dataset comprises of RGB-D images, point cloud data, automatically generated skeleton tracks in addition to crowdsourced annotations.

1 papers0 benchmarksPoint cloud, RGB-D, Videos

JHU CoSTAR Block Stacking Dataset

Involves data where a robot interacts with 5.1 cm colored blocks to complete an order-fulfillment style block stacking task. It contains dynamic scenes and real time-series data in a less constrained environment than comparable datasets. There are nearly 12,000 stacking attempts and over 2 million frames of real data.

1 papers0 benchmarks3D, Images, Point cloud, RGB Video, RGB-D

BigBIRD (Big Berkeley Instance Recognition Dataset)

BigBIRD is a 3D dataset of 125 objects, with the following data for each object:

1 papers0 benchmarksImages, Point cloud, RGB-D

Real SVBRDF

A total of 80 real material samples were captured in a dark room. For each material, multiple captures were collected at different distances from the camera (between 250 and 650 mm) to observe both macro- and micro-level details. The dataset is mostly comprised of planar specimens but also includes non-planar objects such as mugs, globes, crumpled paper, etc. As shown above, it contains a rich diversity of materials, including diffuse or specular wrapping papers, fabrics, anisotropic metals, plastics, rugs, ceramic and wood flooring samples, etc. Each capture set includes 12 LDR (8 bpp) RGB-D images at 4K pixel resolution. Each set is captured at 50% and 100% of maximum light intensity. In total, we captured 462 such image sets (combinations of light intensities, distances to the camera, and material sample).

1 papers0 benchmarksImages, RGB-D, Stereo

THEOStereo

THEOStereo is a dataset providing synthetic stereo image pairs and their corresponding scene depth and will be published along with 1. All images follow the omnidirectional camera model. In total, there are 31,250 omnidirectional images pairs. The training set contains 25,000 image pairs. For validation and testing there are 3,125 image pairs, respectively. For each pair, there is a ground truth depth map describing the pixel-wise distance of the object along the left camera's z-axis. The virtual omnidirectional cameras exhibit a FOV of 180 degrees and can be described using Kannala's camera model 2. The distortion parameters are k_1 = 1 and k_2 = k_3 = k_4 = k_5 = 0. The length of the stereo camera's baseline was 0.3 AU (approx. 15 cm, not 30 cm!). Please do not forget to cite 1 if you use the dataset in your work. Thank you.

1 papers0 benchmarksRGB-D, Stereo

Omiverse Object dataset

Omiverse Object is a large-scale synthetic dataset of 60,000 images including both transparent and opaque objects in different scenes. It is used for depth completion of transparent objects from a single RGB-D view.

1 papers0 benchmarksRGB-D

Boombox

Boombox is a multi-modal dataset for visual reconstruction from acoustic vibrations. Involves dropping objects into a box and capturing resulting images and vibrations. Used for training ML systems that predict images from vibration.

1 papers0 benchmarks3D, Audio, Images, RGB-D, Time series

The RBO Dataset of Articulated Objects and Interactions

The RBO dataset of articulated objects and interactions is a collection of 358 RGB-D video sequences (67:18 minutes) of humans manipulating 14 articulated objects under varying conditions (light, perspective, background, interaction). All sequences are annotated with ground truth of the poses of the rigid parts and the kinematic state of the articulated object (joint states) obtained with a motion capture system. We also provide complete kinematic models of these objects (kinematic structure and three-dimensional textured shape models). In 78 sequences the contact wrenches during the manipulation are also provided.

1 papers0 benchmarks3d meshes, Point cloud, RGB-D, Time series, Videos

BIDCD (Bosch Industrial Depth Completion Dataset)

Bosch Industrial Depth Completion Dataset (BIDCD) is an RGBD dataset for of static table-top scenes with industrial objects. The data was collected with a RealSense depth-camera mounted on a robotic arm, i.e. from multiple Points-of-View (POV), approximately 60 for each scene. We generated depth ground truth with a customized pipeline for removing erroneous depth values, and applied Multi-View geometry to fuse the cleaned depth frames and fill-in missing information. The fused scene mesh was back-projected to each POV, and finally a bi-lateral filter was applied to reduce the remaining holes.

1 papers0 benchmarksImages, RGB-D

Image-based size estimation of broccoli heads under varying degrees of occlusion

This publicly available dataset contains 1613 RGB-D images of field-grown broccoli plants. The dataset also includes the polygon and circle annotations of the broccoli heads.

1 papers0 benchmarksBiology, Images, RGB-D

Ladybird Cobbitty 2017 Brassica Dataset

This data set contains weekly scans of cauliflower and broccoli covering a ten week growth cycle from transplant to harvest. The data set includes ground-truth, physical characteristics of the crop; environmental data collected by a weather station and a soil-senor network; and scans of the crop performed by an autonomous agricultural robot, which include stereo colour, thermal and hyperspectral imagery. The crop were planted at Lansdowne Farm, a University of Sydney agricultural research and teaching facility. Lansdowne Farm is located in Cobbitty, a suburb 70km south-west of Sydney in New South Wales (NSW), Australia. Four 80 metre raised crop beds were prepared with a North-South orientation. Approximately 144 Brassica were planted in each bed. Cauliflower were planted in the first and third bed (from west to east). Broccoli were planted in the second and fourth beds.

1 papers0 benchmarksBiology, Hyperspectral images, Images, RGB-D

MetaGraspNet 1 (MetaGraspNet difficulty 1 - easy)

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.

1 papers0 benchmarksRGB-D

MetaGraspNet 2 (MetaGraspNet difficulty 2 - medium)

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.

1 papers0 benchmarksRGB-D

MetaGraspNet 3 (MetaGraspNet difficulty 3 - hard 1)

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.

1 papers0 benchmarksRGB-D

MetaGraspNet 4 (MetaGraspNet difficulty 4 - hard 2)

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.

1 papers0 benchmarksRGB-D

MetaGraspNet 5 (MetaGraspNet difficulty 5 - very hard)

There has been increasing interest in smart factories powered by robotics systems to tackle repetitive, laborious tasks. One particular impactful yet challenging task in robotics-powered smart factory applications is robotic grasping: using robotic arms to grasp objects autonomously in different settings. Robotic grasping requires a variety of computer vision tasks such as object detection, segmentation, grasp prediction, pick planning, etc. While significant progress has been made in leveraging of machine learning for robotic grasping, particularly with deep learning, a big challenge remains in the need for large-scale, high-quality RGBD datasets that cover a wide diversity of scenarios and permutations.

1 papers0 benchmarksRGB-D

Multiview Manipulation Data (Multiview Manipulation Expert Data and Trained Models)

Accompanying expert data and trained models for 2021 IROS paper on Multiview Manipulation.

1 papers0 benchmarksRGB-D

Datasets

DeepLocCross

RMRC 2014

SPHERE-calorie

CLAD (Complex and Long Activities Dataset)

JHU CoSTAR Block Stacking Dataset

BigBIRD (Big Berkeley Instance Recognition Dataset)

Real SVBRDF

THEOStereo

Omiverse Object dataset

Boombox

The RBO Dataset of Articulated Objects and Interactions

BIDCD (Bosch Industrial Depth Completion Dataset)

Image-based size estimation of broccoli heads under varying degrees of occlusion

Ladybird Cobbitty 2017 Brassica Dataset

MetaGraspNet 1 (MetaGraspNet difficulty 1 - easy)

MetaGraspNet 2 (MetaGraspNet difficulty 2 - medium)

MetaGraspNet 3 (MetaGraspNet difficulty 3 - hard 1)

MetaGraspNet 4 (MetaGraspNet difficulty 4 - hard 2)

MetaGraspNet 5 (MetaGraspNet difficulty 5 - very hard)

Multiview Manipulation Data (Multiview Manipulation Expert Data and Trained Models)

Datasets

DeepLocCross

RMRC 2014

SPHERE-calorie

CLAD (Complex and Long Activities Dataset)

JHU CoSTAR Block Stacking Dataset

BigBIRD (Big Berkeley Instance Recognition Dataset)

Real SVBRDF

THEOStereo

Omiverse Object dataset

Boombox

The RBO Dataset of Articulated Objects and Interactions

BIDCD (Bosch Industrial Depth Completion Dataset)

Image-based size estimation of broccoli heads under varying degrees of occlusion

Ladybird Cobbitty 2017 Brassica Dataset

MetaGraspNet 1 (MetaGraspNet difficulty 1 - easy)

MetaGraspNet 2 (MetaGraspNet difficulty 2 - medium)

MetaGraspNet 3 (MetaGraspNet difficulty 3 - hard 1)

MetaGraspNet 4 (MetaGraspNet difficulty 4 - hard 2)

MetaGraspNet 5 (MetaGraspNet difficulty 5 - very hard)

Multiview Manipulation Data (Multiview Manipulation Expert Data and Trained Models)