TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Depthformer : Multiscale Vision Transformer For Monocular ...

Depthformer : Multiscale Vision Transformer For Monocular Depth Estimation With Local Global Information Fusion

Ashutosh Agarwal, Chetan Arora

2022-07-10Depth PredictionSemantic SegmentationDepth EstimationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

Attention-based models such as transformers have shown outstanding performance on dense prediction tasks, such as semantic segmentation, owing to their capability of capturing long-range dependency in an image. However, the benefit of transformers for monocular depth prediction has seldom been explored so far. This paper benchmarks various transformer-based models for the depth estimation task on an indoor NYUV2 dataset and an outdoor KITTI dataset. We propose a novel attention-based architecture, Depthformer for monocular depth estimation that uses multi-head self-attention to produce the multiscale feature maps, which are effectively combined by our proposed decoder network. We also propose a Transbins module that divides the depth range into bins whose center value is estimated adaptively per image. The final depth estimated is a linear combination of bin centers for each pixel. Transbins module takes advantage of the global receptive field using the transformer module in the encoding stage. Experimental results on NYUV2 and KITTI depth estimation benchmark demonstrate that our proposed method improves the state-of-the-art by 3.3%, and 3.3% respectively in terms of Root Mean Squared Error (RMSE). Code is available at https://github.com/ashutosh1807/Depthformer.git.

Results

TaskDatasetMetricValueModel
Depth EstimationNYU-Depth V2Delta < 1.250.913Depthformer
Depth EstimationNYU-Depth V2Delta < 1.25^20.988Depthformer
Depth EstimationNYU-Depth V2Delta < 1.25^30.997Depthformer
Depth EstimationNYU-Depth V2RMSE0.345Depthformer
Depth EstimationNYU-Depth V2absolute relative error0.1Depthformer
Depth EstimationNYU-Depth V2log 100.042Depthformer
Depth EstimationKITTI Eigen splitDelta < 1.250.967Depthformer
Depth EstimationKITTI Eigen splitDelta < 1.25^20.996Depthformer
Depth EstimationKITTI Eigen splitDelta < 1.25^30.999Depthformer
Depth EstimationKITTI Eigen splitRMSE2.285Depthformer
Depth EstimationKITTI Eigen splitRMSE log0.087Depthformer
Depth EstimationKITTI Eigen splitSq Rel0.187Depthformer
Depth EstimationKITTI Eigen splitabsolute relative error0.058Depthformer
3DNYU-Depth V2Delta < 1.250.913Depthformer
3DNYU-Depth V2Delta < 1.25^20.988Depthformer
3DNYU-Depth V2Delta < 1.25^30.997Depthformer
3DNYU-Depth V2RMSE0.345Depthformer
3DNYU-Depth V2absolute relative error0.1Depthformer
3DNYU-Depth V2log 100.042Depthformer
3DKITTI Eigen splitDelta < 1.250.967Depthformer
3DKITTI Eigen splitDelta < 1.25^20.996Depthformer
3DKITTI Eigen splitDelta < 1.25^30.999Depthformer
3DKITTI Eigen splitRMSE2.285Depthformer
3DKITTI Eigen splitRMSE log0.087Depthformer
3DKITTI Eigen splitSq Rel0.187Depthformer
3DKITTI Eigen splitabsolute relative error0.058Depthformer

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17$S^2M^2$: Scalable Stereo Matching Model for Reliable Depth Estimation2025-07-17$π^3$: Scalable Permutation-Equivariant Visual Geometry Learning2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16