TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/PolarNet: 3D Point Clouds for Language-Guided Robotic Mani...

PolarNet: 3D Point Clouds for Language-Guided Robotic Manipulation

ShiZhe Chen, Ricardo Garcia, Cordelia Schmid, Ivan Laptev

2023-09-27Multi-Task LearningRobot ManipulationRobot Manipulation Generalization
PaperPDFCode(official)

Abstract

The ability for robots to comprehend and execute manipulation tasks based on natural language instructions is a long-term goal in robotics. The dominant approaches for language-guided manipulation use 2D image representations, which face difficulties in combining multi-view cameras and inferring precise 3D positions and relationships. To address these limitations, we propose a 3D point cloud based policy called PolarNet for language-guided manipulation. It leverages carefully designed point cloud inputs, efficient point cloud encoders, and multimodal transformers to learn 3D point cloud representations and integrate them with language instructions for action prediction. PolarNet is shown to be effective and data efficient in a variety of experiments conducted on the RLBench benchmark. It outperforms state-of-the-art 2D and 3D approaches in both single-task and multi-task learning. It also achieves promising results on a real robot.

Results

TaskDatasetMetricValueModel
Robot ManipulationRLBenchInput Image Size128PolarNet
Robot ManipulationRLBenchSucc. Rate (10 tasks, 100 demos/task)89.8PolarNet
Robot ManipulationRLBenchSucc. Rate (18 tasks, 100 demo/task)46.4PolarNet
Robot ManipulationRLBenchSucc. Rate (74 tasks, 100 demos/task)60.3PolarNet
Robot ManipulationRLBenchTraining Time (V100 x 8 x day)5PolarNet
Robot ManipulationGEMBenchAverage Success Rate38.4PolarNet

Related Papers

SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Robust-Multi-Task Gradient Boosting2025-07-15SAMO: A Lightweight Sharpness-Aware Approach for Multi-Task Optimization with Joint Global-Local Perturbation2025-07-10DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge2025-07-06Geometry-aware 4D Video Generation for Robot Manipulation2025-07-01Opportunistic Osteoporosis Diagnosis via Texture-Preserving Self-Supervision, Mixture of Experts and Multi-Task Integration2025-06-25AnchorDP3: 3D Affordance Guided Sparse Diffusion Policy for Robotic Manipulation2025-06-24An Audio-centric Multi-task Learning Framework for Streaming Ads Targeting on Spotify2025-06-23