TransMVSNet: Global Context-aware Multi-view Stereo Network with Transformers

Yikang Ding, Wentao Yuan, Qingtian Zhu, Haotian Zhang, Xiangyue Liu, Yuanjiang Wang, Xiao Liu

2021-11-29CVPR 2022 13D Reconstruction

Abstract

In this paper, we present TransMVSNet, based on our exploration of feature matching in multi-view stereo (MVS). We analogize MVS back to its nature of a feature matching task and therefore propose a powerful Feature Matching Transformer (FMT) to leverage intra- (self-) and inter- (cross-) attention to aggregate long-range context information within and across images. To facilitate a better adaptation of the FMT, we leverage an Adaptive Receptive Field (ARF) module to ensure a smooth transit in scopes of features and bridge different stages with a feature pathway to pass transformed features and gradients across different scales. In addition, we apply pair-wise feature correlation to measure similarity between features, and adopt ambiguity-reducing focal loss to strengthen the supervision. To the best of our knowledge, TransMVSNet is the first attempt to leverage Transformer into the task of MVS. As a result, our method achieves state-of-the-art performance on DTU dataset, Tanks and Temples benchmark, and BlendedMVS dataset. The code of our method will be made available at https://github.com/MegviiRobot/TransMVSNet .

Results

Task	Dataset	Metric	Value	Model
3D Reconstruction	DTU	Acc	0.321	TransMVSNet
3D Reconstruction	DTU	Comp	0.289	TransMVSNet
3D Reconstruction	DTU	Overall	0.305	TransMVSNet
3D	DTU	Acc	0.321	TransMVSNet
3D	DTU	Comp	0.289	TransMVSNet
3D	DTU	Overall	0.305	TransMVSNet

Related Papers

AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17 SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16 BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16 Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15 Binomial Self-Compensation: Mechanism and Suppression of Motion Error in Phase-Shifting Profilometry2025-07-14 An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11 Review of Feed-forward 3D Reconstruction: From DUSt3R to VGGT2025-07-11 DreamGrasp: Zero-Shot 3D Multi-Object Reconstruction from Partial-View Images for Robotic Manipulation2025-07-08