TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Monocular Depth in Dynamic Scenes via Instance-Aw...

Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection Consistency

Seokju Lee, Sunghoon Im, Stephen Lin, In So Kweon

2021-02-04Optical Flow EstimationMotion EstimationUnsupervised Monocular Depth EstimationSemantic SegmentationInstance SegmentationVideo Instance SegmentationMonocular Depth Estimation
PaperPDFCode(official)

Abstract

We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision. Our technical contributions are three-fold. First, we highlight the fundamental difference between inverse and forward projection while modeling the individual motion of each rigid object, and propose a geometrically correct projection pipeline using a neural forward projection module. Second, we design a unified instance-aware photometric and geometric consistency loss that holistically imposes self-supervisory signals for every background and object region. Lastly, we introduce a general-purpose auto-annotation scheme using any off-the-shelf instance segmentation and optical flow models to produce video instance segmentation maps that will be utilized as input to our training pipeline. These proposed elements are validated in a detailed ablation study. Through extensive experiments conducted on the KITTI and Cityscapes dataset, our framework is shown to outperform the state-of-the-art depth and motion estimation methods. Our code, dataset, and models are available at https://github.com/SeokjuLee/Insta-DM .

Results

TaskDatasetMetricValueModel
Depth EstimationCityscapesAbsolute relative error (AbsRel)0.111Lee et al.
Depth EstimationCityscapesRMSE6.437Lee et al.
Depth EstimationCityscapesRMSE log0.182Lee et al.
Depth EstimationCityscapesSquare relative error (SqRel)1.158Lee et al.
3DCityscapesAbsolute relative error (AbsRel)0.111Lee et al.
3DCityscapesRMSE6.437Lee et al.
3DCityscapesRMSE log0.182Lee et al.
3DCityscapesSquare relative error (SqRel)1.158Lee et al.

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17DINO-VO: A Feature-based Visual Odometry Leveraging a Visual Foundation Model2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17SAMST: A Transformer framework based on SAM pseudo label filtering for remote sensing semi-supervised semantic segmentation2025-07-16