TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DSGN++: Exploiting Visual-Spatial Relation for Stereo-base...

DSGN++: Exploiting Visual-Spatial Relation for Stereo-based 3D Detectors

Yilun Chen, Shijia Huang, Shu Liu, Bei Yu, Jiaya Jia

2022-04-063D geometry3D Object Detection From Stereo Imagescross-modal alignment3D Object Detection
PaperPDFCode(official)

Abstract

Camera-based 3D object detectors are welcome due to their wider deployment and lower price than LiDAR sensors. We first revisit the prior stereo detector DSGN for its stereo volume construction ways for representing both 3D geometry and semantics. We polish the stereo modeling and propose the advanced version, DSGN++, aiming to enhance effective information flow throughout the 2D-to-3D pipeline in three main aspects. First, to effectively lift the 2D information to stereo volume, we propose depth-wise plane sweeping (DPS) that allows denser connections and extracts depth-guided features. Second, for grasping differently spaced features, we present a novel stereo volume -- Dual-view Stereo Volume (DSV) that integrates front-view and top-view features and reconstructs sub-voxel depth in the camera frustum. Third, as the foreground region becomes less dominant in 3D space, we propose a multi-modal data editing strategy -- Stereo-LiDAR Copy-Paste, which ensures cross-modal alignment and improves data efficiency. Without bells and whistles, extensive experiments in various modality setups on the popular KITTI benchmark show that our method consistently outperforms other camera-based 3D detectors for all categories. Code is available at https://github.com/chenyilun95/DSGN2.

Results

TaskDatasetMetricValueModel
Object DetectionKITTI Cars ModerateAP7567.37DSGN++
Object DetectionKITTI Cyclists ModerateAP5043.9DSGN++
Object DetectionKITTI Pedestrians ModerateAP5032.74DSGN++
3DKITTI Cars ModerateAP7567.37DSGN++
3DKITTI Cyclists ModerateAP5043.9DSGN++
3DKITTI Pedestrians ModerateAP5032.74DSGN++
3D Object DetectionKITTI Cars ModerateAP7567.37DSGN++
3D Object DetectionKITTI Cyclists ModerateAP5043.9DSGN++
3D Object DetectionKITTI Pedestrians ModerateAP5032.74DSGN++
2D ClassificationKITTI Cars ModerateAP7567.37DSGN++
2D ClassificationKITTI Cyclists ModerateAP5043.9DSGN++
2D ClassificationKITTI Pedestrians ModerateAP5032.74DSGN++
2D Object DetectionKITTI Cars ModerateAP7567.37DSGN++
2D Object DetectionKITTI Cyclists ModerateAP5043.9DSGN++
2D Object DetectionKITTI Pedestrians ModerateAP5032.74DSGN++
16kKITTI Cars ModerateAP7567.37DSGN++
16kKITTI Cyclists ModerateAP5043.9DSGN++
16kKITTI Pedestrians ModerateAP5032.74DSGN++

Related Papers

Transformer-based Spatial Grounding: A Comprehensive Survey2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Robust 3D-Masked Part-level Editing in 3D Gaussian Splatting with Regularized Score Distillation Sampling2025-07-15TRAN-D: 2D Gaussian Splatting-based Sparse-view Transparent Object Depth Reconstruction via Physics Simulation for Scene Update2025-07-15CATVis: Context-Aware Thought Visualization2025-07-15Bridge Feature Matching and Cross-Modal Alignment with Mutual-filtering for Zero-shot Anomaly Detection2025-07-15Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09Feed-Forward SceneDINO for Unsupervised Semantic Scene Completion2025-07-08