TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Unifying Flow, Stereo and Depth Estimation

Unifying Flow, Stereo and Depth Estimation

Haofei Xu, Jing Zhang, Jianfei Cai, Hamid Rezatofighi, Fisher Yu, DaCheng Tao, Andreas Geiger

2022-11-10Stereo MatchingStereo Depth EstimationOptical Flow EstimationDepth Estimation
PaperPDFCode(official)

Abstract

We present a unified formulation and model for three motion and 3D perception tasks: optical flow, rectified stereo matching and unrectified stereo depth estimation from posed images. Unlike previous specialized architectures for each specific task, we formulate all three tasks as a unified dense correspondence matching problem, which can be solved with a single model by directly comparing feature similarities. Such a formulation calls for discriminative feature representations, which we achieve using a Transformer, in particular the cross-attention mechanism. We demonstrate that cross-attention enables integration of knowledge from another image via cross-view interactions, which greatly improves the quality of the extracted features. Our unified model naturally enables cross-task transfer since the model architecture and parameters are shared across tasks. We outperform RAFT with our unified model on the challenging Sintel dataset, and our final model that uses a few additional task-specific refinement steps outperforms or compares favorably to recent state-of-the-art methods on 10 popular flow, stereo and depth datasets, while being simpler and more efficient in terms of model design and inference speed.

Results

TaskDatasetMetricValueModel
Optical Flow EstimationSintel-cleanAverage End-Point Error1.03GMFlow
Optical Flow EstimationSintel-finalAverage End-Point Error2.37GMFlow

Related Papers

$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16MonoMVSNet: Monocular Priors Guided Multi-View Stereo Network2025-07-15Towards Depth Foundation Model: Recent Trends in Vision-Based Depth Estimation2025-07-15Cameras as Relative Positional Encoding2025-07-14