TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/When Epipolar Constraint Meets Non-local Operators in Mult...

When Epipolar Constraint Meets Non-local Operators in Multi-View Stereo

Tianqi Liu, Xinyi Ye, Weiyue Zhao, Zhiyu Pan, Min Shi, Zhiguo Cao

2023-09-29ICCV 2023 1Stereo MatchingDescriptivePoint Clouds3D Reconstruction
PaperPDFCode(official)

Abstract

Learning-based multi-view stereo (MVS) method heavily relies on feature matching, which requires distinctive and descriptive representations. An effective solution is to apply non-local feature aggregation, e.g., Transformer. Albeit useful, these techniques introduce heavy computation overheads for MVS. Each pixel densely attends to the whole image. In contrast, we propose to constrain non-local feature augmentation within a pair of lines: each point only attends the corresponding pair of epipolar lines. Our idea takes inspiration from the classic epipolar geometry, which shows that one point with different depth hypotheses will be projected to the epipolar line on the other view. This constraint reduces the 2D search space into the epipolar line in stereo matching. Similarly, this suggests that the matching of MVS is to distinguish a series of points lying on the same line. Inspired by this point-to-line search, we devise a line-to-point non-local augmentation strategy. We first devise an optimized searching algorithm to split the 2D feature maps into epipolar line pairs. Then, an Epipolar Transformer (ET) performs non-local feature augmentation among epipolar line pairs. We incorporate the ET into a learning-based MVS baseline, named ET-MVSNet. ET-MVSNet achieves state-of-the-art reconstruction performance on both the DTU and Tanks-and-Temples benchmark with high efficiency. Code is available at https://github.com/TQTQliu/ET-MVSNet.

Results

TaskDatasetMetricValueModel
3D ReconstructionDTUAcc0.329ET-MVSNet
3D ReconstructionDTUComp0.253ET-MVSNet
3D ReconstructionDTUOverall0.291ET-MVSNet
3DDTUAcc0.329ET-MVSNet
3DDTUComp0.253ET-MVSNet
3DDTUOverall0.291ET-MVSNet
Point CloudsTanks and TemplesMean F1 (Advanced)40.41ET-MVSNet
Point CloudsTanks and TemplesMean F1 (Intermediate)65.49ET-MVSNet

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17DiffRhythm+: Controllable and Flexible Full-Length Song Generation with Preference Optimization2025-07-17AutoPartGen: Autogressive 3D Part Generation and Discovery2025-07-17Assay2Mol: large language model-based drug design using BioAssay context2025-07-16Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16SpatialTrackerV2: 3D Point Tracking Made Easy2025-07-16BRUM: Robust 3D Vehicle Reconstruction from 360 Sparse Images2025-07-16Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15