TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Generalized Binary Search Network for Highly-Efficient Mul...

Generalized Binary Search Network for Highly-Efficient Multi-View Stereo

Zhenxing Mi, Di Chang, Dan Xu

2021-12-04CVPR 2022 1Point CloudsDepth Prediction3D ReconstructionDepth Estimation
PaperPDFCode(official)

Abstract

Multi-view Stereo (MVS) with known camera parameters is essentially a 1D search problem within a valid depth range. Recent deep learning-based MVS methods typically densely sample depth hypotheses in the depth range, and then construct prohibitively memory-consuming 3D cost volumes for depth prediction. Although coarse-to-fine sampling strategies alleviate this overhead issue to a certain extent, the efficiency of MVS is still an open challenge. In this work, we propose a novel method for highly efficient MVS that remarkably decreases the memory footprint, meanwhile clearly advancing state-of-the-art depth prediction performance. We investigate what a search strategy can be reasonably optimal for MVS taking into account of both efficiency and effectiveness. We first formulate MVS as a binary search problem, and accordingly propose a generalized binary search network for MVS. Specifically, in each step, the depth range is split into 2 bins with extra 1 error tolerance bin on both sides. A classification is performed to identify which bin contains the true depth. We also design three mechanisms to respectively handle classification errors, deal with out-of-range samples and decrease the training memory. The new formulation makes our method only sample a very small number of depth hypotheses in each step, which is highly memory efficient, and also greatly facilitates quick training convergence. Experiments on competitive benchmarks show that our method achieves state-of-the-art accuracy with much less memory. Particularly, our method obtains an overall score of 0.289 on DTU dataset and tops the first place on challenging Tanks and Temples advanced dataset among all the learning-based methods. The trained models and code will be released at https://github.com/MiZhenxing/GBi-Net.

Results

TaskDatasetMetricValueModel
3D ReconstructionDTUAcc0.312GBi-Net
3D ReconstructionDTUComp0.293GBi-Net
3D ReconstructionDTUOverall0.303GBi-Net
3DDTUAcc0.312GBi-Net
3DDTUComp0.293GBi-Net
3DDTUOverall0.303GBi-Net
Point CloudsTanks and TemplesMean F1 (Intermediate)61.42GBi-Net

Related Papers

AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15