TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BEVFormer v2: Adapting Modern Image Backbones to Bird's-Ey...

BEVFormer v2: Adapting Modern Image Backbones to Bird's-Eye-View Recognition via Perspective Supervision

Chenyu Yang, Yuntao Chen, Hao Tian, Chenxin Tao, Xizhou Zhu, Zhaoxiang Zhang, Gao Huang, Hongyang Li, Yu Qiao, Lewei Lu, Jie zhou, Jifeng Dai

2022-11-18CVPR 2023 13D Object Detection
PaperPDFCodeCode

Abstract

We present a novel bird's-eye-view (BEV) detector with perspective supervision, which converges faster and better suits modern image backbones. Existing state-of-the-art BEV detectors are often tied to certain depth pre-trained backbones like VoVNet, hindering the synergy between booming image backbones and BEV detectors. To address this limitation, we prioritize easing the optimization of BEV detectors by introducing perspective space supervision. To this end, we propose a two-stage BEV detector, where proposals from the perspective head are fed into the bird's-eye-view head for final predictions. To evaluate the effectiveness of our model, we conduct extensive ablation studies focusing on the form of supervision and the generality of the proposed detector. The proposed method is verified with a wide spectrum of traditional and modern image backbones and achieves new SoTA results on the large-scale nuScenes dataset. The code shall be released soon.

Results

TaskDatasetMetricValueModel
Object DetectionnuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
Object DetectionRope3DAP@0.724.64BEVFormer
3DnuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
3DRope3DAP@0.724.64BEVFormer
3D Object DetectionnuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
3D Object DetectionRope3DAP@0.724.64BEVFormer
2D ClassificationnuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
2D ClassificationRope3DAP@0.724.64BEVFormer
2D Object DetectionnuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
2D Object DetectionRope3DAP@0.724.64BEVFormer
16knuScenes Camera OnlyNDS63.4BEVFormer v2 (InternImage-XL)
16kRope3DAP@0.724.64BEVFormer

Related Papers

Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection2025-07-06A Survey of Multi-sensor Fusion Perception for Embodied AI: Background, Methods, Challenges and Prospects2025-06-24Teleoperated Driving: a New Challenge for 3D Object Detection in Compressed Point Clouds2025-06-13Vision-based Lifting of 2D Object Detections for Automated Driving2025-06-13DySS: Dynamic Queries and State-Space Learning for Efficient 3D Object Detection from Multi-Camera Videos2025-06-11Gaussian2Scene: 3D Scene Representation Learning via Self-supervised Learning with 3D Gaussian Splatting2025-06-10