TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MegaPose: 6D Pose Estimation of Novel Objects via Render &...

MegaPose: 6D Pose Estimation of Novel Objects via Render & Compare

Yann Labbé, Lucas Manuelli, Arsalan Mousavian, Stephen Tyree, Stan Birchfield, Jonathan Tremblay, Justin Carpentier, Mathieu Aubry, Dieter Fox, Josef Sivic

2022-12-13Pose Estimation3D Object Detection6D Pose Estimation
PaperPDFCode

Abstract

We introduce MegaPose, a method to estimate the 6D pose of novel objects, that is, objects unseen during training. At inference time, the method only assumes knowledge of (i) a region of interest displaying the object in the image and (ii) a CAD model of the observed object. The contributions of this work are threefold. First, we present a 6D pose refiner based on a render&compare strategy which can be applied to novel objects. The shape and coordinate system of the novel object are provided as inputs to the network by rendering multiple synthetic views of the object's CAD model. Second, we introduce a novel approach for coarse pose estimation which leverages a network trained to classify whether the pose error between a synthetic rendering and an observed image of the same object can be corrected by the refiner. Third, we introduce a large-scale synthetic dataset of photorealistic images of thousands of objects with diverse visual and shape properties and show that this diversity is crucial to obtain good generalization performance on novel objects. We train our approach on this large synthetic dataset and apply it without retraining to hundreds of novel objects in real images from several pose estimation benchmarks. Our approach achieves state-of-the-art performance on the ModelNet and YCB-Video datasets. An extensive evaluation on the 7 core datasets of the BOP challenge demonstrates that our approach achieves performance competitive with existing approaches that require access to the target objects during training. Code, dataset and trained models are available on the project page: https://megapose6d.github.io/.

Results

TaskDatasetMetricValueModel
Pose EstimationDTTD-MobileADD AUC49.02MegaPose-RGBD (refined)
Pose EstimationDTTD-MobileADD-S AUC62.44MegaPose-RGBD (refined)
Pose EstimationDTTD-MobileAR CH8.77MegaPose-RGBD (refined)
Pose EstimationDTTD-MobileAR CoU17.73MegaPose-RGBD (refined)
Pose EstimationDTTD-MobileAR pCH57MegaPose-RGBD (refined)
Pose EstimationDTTD-MobileAR CH6.67MegaPose-RGBD (Coarse)
Pose EstimationDTTD-MobileAR CoU13.72MegaPose-RGBD (Coarse)
Pose EstimationDTTD-MobileAR pCH58.05MegaPose-RGBD (Coarse)
Object DetectionDTTD-MobileADD AUC49.02MegaPose-RGBD
Object DetectionDTTD-MobileADD-S AUC62.44MegaPose-RGBD
3DDTTD-MobileADD AUC49.02MegaPose-RGBD
3DDTTD-MobileADD-S AUC62.44MegaPose-RGBD
3DDTTD-MobileADD AUC49.02MegaPose-RGBD (refined)
3DDTTD-MobileADD-S AUC62.44MegaPose-RGBD (refined)
3DDTTD-MobileAR CH8.77MegaPose-RGBD (refined)
3DDTTD-MobileAR CoU17.73MegaPose-RGBD (refined)
3DDTTD-MobileAR pCH57MegaPose-RGBD (refined)
3DDTTD-MobileAR CH6.67MegaPose-RGBD (Coarse)
3DDTTD-MobileAR CoU13.72MegaPose-RGBD (Coarse)
3DDTTD-MobileAR pCH58.05MegaPose-RGBD (Coarse)
3D Object DetectionDTTD-MobileADD AUC49.02MegaPose-RGBD
3D Object DetectionDTTD-MobileADD-S AUC62.44MegaPose-RGBD
6D Pose EstimationDTTD-MobileADD AUC49.02MegaPose-RGBD (refined)
6D Pose EstimationDTTD-MobileADD-S AUC62.44MegaPose-RGBD (refined)
6D Pose EstimationDTTD-MobileAR CH8.77MegaPose-RGBD (refined)
6D Pose EstimationDTTD-MobileAR CoU17.73MegaPose-RGBD (refined)
6D Pose EstimationDTTD-MobileAR pCH57MegaPose-RGBD (refined)
6D Pose EstimationDTTD-MobileAR CH6.67MegaPose-RGBD (Coarse)
6D Pose EstimationDTTD-MobileAR CoU13.72MegaPose-RGBD (Coarse)
6D Pose EstimationDTTD-MobileAR pCH58.05MegaPose-RGBD (Coarse)
2D ClassificationDTTD-MobileADD AUC49.02MegaPose-RGBD
2D ClassificationDTTD-MobileADD-S AUC62.44MegaPose-RGBD
2D Object DetectionDTTD-MobileADD AUC49.02MegaPose-RGBD
2D Object DetectionDTTD-MobileADD-S AUC62.44MegaPose-RGBD
1 Image, 2*2 StitchiDTTD-MobileADD AUC49.02MegaPose-RGBD (refined)
1 Image, 2*2 StitchiDTTD-MobileADD-S AUC62.44MegaPose-RGBD (refined)
1 Image, 2*2 StitchiDTTD-MobileAR CH8.77MegaPose-RGBD (refined)
1 Image, 2*2 StitchiDTTD-MobileAR CoU17.73MegaPose-RGBD (refined)
1 Image, 2*2 StitchiDTTD-MobileAR pCH57MegaPose-RGBD (refined)
1 Image, 2*2 StitchiDTTD-MobileAR CH6.67MegaPose-RGBD (Coarse)
1 Image, 2*2 StitchiDTTD-MobileAR CoU13.72MegaPose-RGBD (Coarse)
1 Image, 2*2 StitchiDTTD-MobileAR pCH58.05MegaPose-RGBD (Coarse)
16kDTTD-MobileADD AUC49.02MegaPose-RGBD
16kDTTD-MobileADD-S AUC62.44MegaPose-RGBD

Related Papers

$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Revisiting Reliability in the Reasoning-based Pose Estimation Benchmark2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17From Neck to Head: Bio-Impedance Sensing for Head Pose Estimation2025-07-17AthleticsPose: Authentic Sports Motion Dataset on Athletic Field and Evaluation of Monocular 3D Pose Estimation Ability2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16SGLoc: Semantic Localization System for Camera Pose Estimation from 3D Gaussian Splatting Representation2025-07-16