TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object ...

BEVDet4D: Exploit Temporal Cues in Multi-camera 3D Object Detection

JunJie Huang, Guan Huang

2022-03-31object-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Single frame data contains finite information which limits the performance of the existing vision-based multi-camera 3D object detection paradigms. For fundamentally pushing the performance boundary in this area, a novel paradigm dubbed BEVDet4D is proposed to lift the scalable BEVDet paradigm from the spatial-only 3D space to the spatial-temporal 4D space. We upgrade the naive BEVDet framework with a few modifications just for fusing the feature from the previous frame with the corresponding one in the current frame. In this way, with negligible additional computing budget, we enable BEVDet4D to access the temporal cues by querying and comparing the two candidate features. Beyond this, we simplify the task of velocity prediction by removing the factors of ego-motion and time in the learning target. As a result, BEVDet4D with robust generalization performance reduces the velocity error by up to -62.9%. This makes the vision-based methods, for the first time, become comparable with those relied on LiDAR or radar in this aspect. On challenge benchmark nuScenes, we report a new record of 54.5% NDS with the high-performance configuration dubbed BEVDet4D-Base, which surpasses the previous leading method BEVDet-Base by +7.3% NDS. The source code is publicly available for further research at https://github.com/HuangJunJie2017/BEVDet .

Results

TaskDatasetMetricValueModel
Object DetectionnuScenes Camera OnlyNDS56.9BEVDet4D
Object DetectionnuScenesNDS0.569BEVDet4D
Object DetectionnuScenesmAAE0.121BEVDet4D
Object DetectionnuScenesmAOE0.386BEVDet4D
Object DetectionnuScenesmAP0.451BEVDet4D
Object DetectionnuScenesmASE0.241BEVDet4D
Object DetectionnuScenesmATE0.511BEVDet4D
Object DetectionnuScenesmAVE0.301BEVDet4D
3DnuScenes Camera OnlyNDS56.9BEVDet4D
3DnuScenesNDS0.569BEVDet4D
3DnuScenesmAAE0.121BEVDet4D
3DnuScenesmAOE0.386BEVDet4D
3DnuScenesmAP0.451BEVDet4D
3DnuScenesmASE0.241BEVDet4D
3DnuScenesmATE0.511BEVDet4D
3DnuScenesmAVE0.301BEVDet4D
3D Object DetectionnuScenes Camera OnlyNDS56.9BEVDet4D
3D Object DetectionnuScenesNDS0.569BEVDet4D
3D Object DetectionnuScenesmAAE0.121BEVDet4D
3D Object DetectionnuScenesmAOE0.386BEVDet4D
3D Object DetectionnuScenesmAP0.451BEVDet4D
3D Object DetectionnuScenesmASE0.241BEVDet4D
3D Object DetectionnuScenesmATE0.511BEVDet4D
3D Object DetectionnuScenesmAVE0.301BEVDet4D
2D ClassificationnuScenes Camera OnlyNDS56.9BEVDet4D
2D ClassificationnuScenesNDS0.569BEVDet4D
2D ClassificationnuScenesmAAE0.121BEVDet4D
2D ClassificationnuScenesmAOE0.386BEVDet4D
2D ClassificationnuScenesmAP0.451BEVDet4D
2D ClassificationnuScenesmASE0.241BEVDet4D
2D ClassificationnuScenesmATE0.511BEVDet4D
2D ClassificationnuScenesmAVE0.301BEVDet4D
2D Object DetectionnuScenes Camera OnlyNDS56.9BEVDet4D
2D Object DetectionnuScenesNDS0.569BEVDet4D
2D Object DetectionnuScenesmAAE0.121BEVDet4D
2D Object DetectionnuScenesmAOE0.386BEVDet4D
2D Object DetectionnuScenesmAP0.451BEVDet4D
2D Object DetectionnuScenesmASE0.241BEVDet4D
2D Object DetectionnuScenesmATE0.511BEVDet4D
2D Object DetectionnuScenesmAVE0.301BEVDet4D
16knuScenes Camera OnlyNDS56.9BEVDet4D
16knuScenesNDS0.569BEVDet4D
16knuScenesmAAE0.121BEVDet4D
16knuScenesmAOE0.386BEVDet4D
16knuScenesmAP0.451BEVDet4D
16knuScenesmASE0.241BEVDet4D
16knuScenesmATE0.511BEVDet4D
16knuScenesmAVE0.301BEVDet4D

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07