Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Mu Hu, Wei Yin, Chi Zhang, Zhipeng Cai, Xiaoxiao Long, Kaixuan Wang, Hao Chen, Gang Yu, Chunhua Shen, Shaojie Shen

2024-03-22Under review for Transaction 2024 4Zero-shot Generalization Surface Normal Estimation Depth Estimation

Abstract

We introduce Metric3D v2, a geometric foundation model for zero-shot metric depth and surface normal estimation from a single image, which is crucial for metric 3D recovery. While depth and normal are geometrically related and highly complimentary, they present distinct challenges. SoTA monocular depth methods achieve zero-shot generalization by learning affine-invariant depths, which cannot recover real-world metrics. Meanwhile, SoTA normal estimation methods have limited zero-shot performance due to the lack of large-scale labeled data. To tackle these issues, we propose solutions for both metric depth estimation and surface normal estimation. For metric depth estimation, we show that the key to a zero-shot single-view model lies in resolving the metric ambiguity from various camera models and large-scale data training. We propose a canonical camera space transformation module, which explicitly addresses the ambiguity problem and can be effortlessly plugged into existing monocular models. For surface normal estimation, we propose a joint depth-normal optimization module to distill diverse data knowledge from metric depth, enabling normal estimators to learn beyond normal labels. Equipped with these modules, our depth-normal models can be stably trained with over 16 million of images from thousands of camera models with different-type annotations, resulting in zero-shot generalization to in-the-wild images with unseen camera settings. Our method enables the accurate recovery of metric 3D structures on randomly collected internet images, paving the way for plausible single-image metrology. Our project page is at https://JUGGHM.github.io/Metric3Dv2.

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.989	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.998	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	1	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	RMSE	0.183	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	absolute relative error	0.047	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	log 10	0.02	Metric3Dv2(L, FT)
Depth Estimation	IBims-1	δ1.25	0.969	Metric3D-v2(L, ZS)
Depth Estimation	KITTI Eigen split	Delta < 1.25	0.989	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	Delta < 1.25^2	0.998	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	Delta < 1.25^3	1	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	RMSE	1.766	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	RMSE log	0.06	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	absolute relative error	0.039	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	NYU-Depth V2	Delta < 1.25	0.989	Metric3Dv2(L, FT)
3D	NYU-Depth V2	Delta < 1.25^2	0.998	Metric3Dv2(L, FT)
3D	NYU-Depth V2	Delta < 1.25^3	1	Metric3Dv2(L, FT)
3D	NYU-Depth V2	RMSE	0.183	Metric3Dv2(L, FT)
3D	NYU-Depth V2	absolute relative error	0.047	Metric3Dv2(L, FT)
3D	NYU-Depth V2	log 10	0.02	Metric3Dv2(L, FT)
3D	IBims-1	δ1.25	0.969	Metric3D-v2(L, ZS)
3D	KITTI Eigen split	Delta < 1.25	0.989	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	Delta < 1.25^2	0.998	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	Delta < 1.25^3	1	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	RMSE	1.766	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	RMSE log	0.06	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	absolute relative error	0.039	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Surface Normals Estimation	IBims-1	% < 11.25	69.7	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	% < 22.5	76.2	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	% < 30	78.8	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	Mean	19.6	Metric3Dv2(g2, ZS)
Surface Normals Estimation	ScanNetV2	% < 11.25	77.8	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	% < 22.5	90.1	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	% < 30	93.5	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	Mean Angle Error	9.2	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	NYU Depth v2	% < 11.25	68.8	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	% < 22.5	84.9	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	% < 30	89.8	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	Mean Angle Error	12	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	RMSE	19.2	Metric3Dv2(L, FT)

Abstract

Results

Task	Dataset	Metric	Value	Model
Depth Estimation	NYU-Depth V2	Delta < 1.25	0.989	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	Delta < 1.25^2	0.998	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	Delta < 1.25^3	1	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	RMSE	0.183	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	absolute relative error	0.047	Metric3Dv2(L, FT)
Depth Estimation	NYU-Depth V2	log 10	0.02	Metric3Dv2(L, FT)
Depth Estimation	IBims-1	δ1.25	0.969	Metric3D-v2(L, ZS)
Depth Estimation	KITTI Eigen split	Delta < 1.25	0.989	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	Delta < 1.25^2	0.998	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	Delta < 1.25^3	1	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	RMSE	1.766	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	RMSE log	0.06	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Depth Estimation	KITTI Eigen split	absolute relative error	0.039	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	NYU-Depth V2	Delta < 1.25	0.989	Metric3Dv2(L, FT)
3D	NYU-Depth V2	Delta < 1.25^2	0.998	Metric3Dv2(L, FT)
3D	NYU-Depth V2	Delta < 1.25^3	1	Metric3Dv2(L, FT)
3D	NYU-Depth V2	RMSE	0.183	Metric3Dv2(L, FT)
3D	NYU-Depth V2	absolute relative error	0.047	Metric3Dv2(L, FT)
3D	NYU-Depth V2	log 10	0.02	Metric3Dv2(L, FT)
3D	IBims-1	δ1.25	0.969	Metric3D-v2(L, ZS)
3D	KITTI Eigen split	Delta < 1.25	0.989	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	Delta < 1.25^2	0.998	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	Delta < 1.25^3	1	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	RMSE	1.766	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	RMSE log	0.06	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
3D	KITTI Eigen split	absolute relative error	0.039	Metric3Dv2 (g2, FT, 80m, flip_aug_test)
Surface Normals Estimation	IBims-1	% < 11.25	69.7	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	% < 22.5	76.2	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	% < 30	78.8	Metric3Dv2(g2, ZS)
Surface Normals Estimation	IBims-1	Mean	19.6	Metric3Dv2(g2, ZS)
Surface Normals Estimation	ScanNetV2	% < 11.25	77.8	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	% < 22.5	90.1	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	% < 30	93.5	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	ScanNetV2	Mean Angle Error	9.2	Metric3Dv2 (g2, In-domain)
Surface Normals Estimation	NYU Depth v2	% < 11.25	68.8	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	% < 22.5	84.9	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	% < 30	89.8	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	Mean Angle Error	12	Metric3Dv2(L, FT)
Surface Normals Estimation	NYU Depth v2	RMSE	19.2	Metric3Dv2(L, FT)

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Abstract

Results

Related Papers

Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation

Abstract

Results

Related Papers