TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/FocalFormer3D : Focusing on Hard Instance for 3D Object De...

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Yilun Chen, Zhiding Yu, Yukang Chen, Shiyi Lan, Animashree Anandkumar, Jiaya Jia, Jose Alvarez

2023-08-08Autonomous Drivingobject-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

False negatives (FN) in 3D object detection, {\em e.g.}, missing predictions of pedestrians, vehicles, or other obstacles, can lead to potentially dangerous situations in autonomous driving. While being fatal, this issue is understudied in many current 3D detection methods. In this work, we propose Hard Instance Probing (HIP), a general pipeline that identifies \textit{FN} in a multi-stage manner and guides the models to focus on excavating difficult instances. For 3D object detection, we instantiate this method as FocalFormer3D, a simple yet effective detector that excels at excavating difficult objects and improving prediction recall. FocalFormer3D features a multi-stage query generation to discover hard objects and a box-level transformer decoder to efficiently distinguish objects from massive object candidates. Experimental results on the nuScenes and Waymo datasets validate the superior performance of FocalFormer3D. The advantage leads to strong performance on both detection and tracking, in both LiDAR and multi-modal settings. Notably, FocalFormer3D achieves a 70.5 mAP and 73.9 NDS on nuScenes detection benchmark, while the nuScenes tracking benchmark shows 72.1 AMOTA, both ranking 1st place on the nuScenes LiDAR leaderboard. Our code is available at \url{https://github.com/NVlabs/FocalFormer3D}.

Results

TaskDatasetMetricValueModel
Object DetectionnuScenesNDS0.75FocalFormer3D-F
Object DetectionnuScenesmAAE0.13FocalFormer3D-F
Object DetectionnuScenesmAOE0.33FocalFormer3D-F
Object DetectionnuScenesmAP0.72FocalFormer3D-F
Object DetectionnuScenesmASE0.24FocalFormer3D-F
Object DetectionnuScenesmATE0.25FocalFormer3D-F
Object DetectionnuScenesmAVE0.23FocalFormer3D-F
Object DetectionnuScenesNDS0.74FocalFormer3D-TTA
Object DetectionnuScenesmAAE0.13FocalFormer3D-TTA
Object DetectionnuScenesmAOE0.32FocalFormer3D-TTA
Object DetectionnuScenesmAP0.71FocalFormer3D-TTA
Object DetectionnuScenesmASE0.24FocalFormer3D-TTA
Object DetectionnuScenesmATE0.24FocalFormer3D-TTA
Object DetectionnuScenesmAVE0.2FocalFormer3D-TTA
Object DetectionnuScenesNDS0.73FocalFormer3D-L
Object DetectionnuScenesmAAE0.13FocalFormer3D-L
Object DetectionnuScenesmAOE0.34FocalFormer3D-L
Object DetectionnuScenesmAP0.69FocalFormer3D-L
Object DetectionnuScenesmASE0.24FocalFormer3D-L
Object DetectionnuScenesmATE0.25FocalFormer3D-L
Object DetectionnuScenesmAVE0.22FocalFormer3D-L
3DnuScenesNDS0.75FocalFormer3D-F
3DnuScenesmAAE0.13FocalFormer3D-F
3DnuScenesmAOE0.33FocalFormer3D-F
3DnuScenesmAP0.72FocalFormer3D-F
3DnuScenesmASE0.24FocalFormer3D-F
3DnuScenesmATE0.25FocalFormer3D-F
3DnuScenesmAVE0.23FocalFormer3D-F
3DnuScenesNDS0.74FocalFormer3D-TTA
3DnuScenesmAAE0.13FocalFormer3D-TTA
3DnuScenesmAOE0.32FocalFormer3D-TTA
3DnuScenesmAP0.71FocalFormer3D-TTA
3DnuScenesmASE0.24FocalFormer3D-TTA
3DnuScenesmATE0.24FocalFormer3D-TTA
3DnuScenesmAVE0.2FocalFormer3D-TTA
3DnuScenesNDS0.73FocalFormer3D-L
3DnuScenesmAAE0.13FocalFormer3D-L
3DnuScenesmAOE0.34FocalFormer3D-L
3DnuScenesmAP0.69FocalFormer3D-L
3DnuScenesmASE0.24FocalFormer3D-L
3DnuScenesmATE0.25FocalFormer3D-L
3DnuScenesmAVE0.22FocalFormer3D-L
3D Object DetectionnuScenesNDS0.75FocalFormer3D-F
3D Object DetectionnuScenesmAAE0.13FocalFormer3D-F
3D Object DetectionnuScenesmAOE0.33FocalFormer3D-F
3D Object DetectionnuScenesmAP0.72FocalFormer3D-F
3D Object DetectionnuScenesmASE0.24FocalFormer3D-F
3D Object DetectionnuScenesmATE0.25FocalFormer3D-F
3D Object DetectionnuScenesmAVE0.23FocalFormer3D-F
3D Object DetectionnuScenesNDS0.74FocalFormer3D-TTA
3D Object DetectionnuScenesmAAE0.13FocalFormer3D-TTA
3D Object DetectionnuScenesmAOE0.32FocalFormer3D-TTA
3D Object DetectionnuScenesmAP0.71FocalFormer3D-TTA
3D Object DetectionnuScenesmASE0.24FocalFormer3D-TTA
3D Object DetectionnuScenesmATE0.24FocalFormer3D-TTA
3D Object DetectionnuScenesmAVE0.2FocalFormer3D-TTA
3D Object DetectionnuScenesNDS0.73FocalFormer3D-L
3D Object DetectionnuScenesmAAE0.13FocalFormer3D-L
3D Object DetectionnuScenesmAOE0.34FocalFormer3D-L
3D Object DetectionnuScenesmAP0.69FocalFormer3D-L
3D Object DetectionnuScenesmASE0.24FocalFormer3D-L
3D Object DetectionnuScenesmATE0.25FocalFormer3D-L
3D Object DetectionnuScenesmAVE0.22FocalFormer3D-L
2D ClassificationnuScenesNDS0.75FocalFormer3D-F
2D ClassificationnuScenesmAAE0.13FocalFormer3D-F
2D ClassificationnuScenesmAOE0.33FocalFormer3D-F
2D ClassificationnuScenesmAP0.72FocalFormer3D-F
2D ClassificationnuScenesmASE0.24FocalFormer3D-F
2D ClassificationnuScenesmATE0.25FocalFormer3D-F
2D ClassificationnuScenesmAVE0.23FocalFormer3D-F
2D ClassificationnuScenesNDS0.74FocalFormer3D-TTA
2D ClassificationnuScenesmAAE0.13FocalFormer3D-TTA
2D ClassificationnuScenesmAOE0.32FocalFormer3D-TTA
2D ClassificationnuScenesmAP0.71FocalFormer3D-TTA
2D ClassificationnuScenesmASE0.24FocalFormer3D-TTA
2D ClassificationnuScenesmATE0.24FocalFormer3D-TTA
2D ClassificationnuScenesmAVE0.2FocalFormer3D-TTA
2D ClassificationnuScenesNDS0.73FocalFormer3D-L
2D ClassificationnuScenesmAAE0.13FocalFormer3D-L
2D ClassificationnuScenesmAOE0.34FocalFormer3D-L
2D ClassificationnuScenesmAP0.69FocalFormer3D-L
2D ClassificationnuScenesmASE0.24FocalFormer3D-L
2D ClassificationnuScenesmATE0.25FocalFormer3D-L
2D ClassificationnuScenesmAVE0.22FocalFormer3D-L
2D Object DetectionnuScenesNDS0.75FocalFormer3D-F
2D Object DetectionnuScenesmAAE0.13FocalFormer3D-F
2D Object DetectionnuScenesmAOE0.33FocalFormer3D-F
2D Object DetectionnuScenesmAP0.72FocalFormer3D-F
2D Object DetectionnuScenesmASE0.24FocalFormer3D-F
2D Object DetectionnuScenesmATE0.25FocalFormer3D-F
2D Object DetectionnuScenesmAVE0.23FocalFormer3D-F
2D Object DetectionnuScenesNDS0.74FocalFormer3D-TTA
2D Object DetectionnuScenesmAAE0.13FocalFormer3D-TTA
2D Object DetectionnuScenesmAOE0.32FocalFormer3D-TTA
2D Object DetectionnuScenesmAP0.71FocalFormer3D-TTA
2D Object DetectionnuScenesmASE0.24FocalFormer3D-TTA
2D Object DetectionnuScenesmATE0.24FocalFormer3D-TTA
2D Object DetectionnuScenesmAVE0.2FocalFormer3D-TTA
2D Object DetectionnuScenesNDS0.73FocalFormer3D-L
2D Object DetectionnuScenesmAAE0.13FocalFormer3D-L
2D Object DetectionnuScenesmAOE0.34FocalFormer3D-L
2D Object DetectionnuScenesmAP0.69FocalFormer3D-L
2D Object DetectionnuScenesmASE0.24FocalFormer3D-L
2D Object DetectionnuScenesmATE0.25FocalFormer3D-L
2D Object DetectionnuScenesmAVE0.22FocalFormer3D-L
16knuScenesNDS0.75FocalFormer3D-F
16knuScenesmAAE0.13FocalFormer3D-F
16knuScenesmAOE0.33FocalFormer3D-F
16knuScenesmAP0.72FocalFormer3D-F
16knuScenesmASE0.24FocalFormer3D-F
16knuScenesmATE0.25FocalFormer3D-F
16knuScenesmAVE0.23FocalFormer3D-F
16knuScenesNDS0.74FocalFormer3D-TTA
16knuScenesmAAE0.13FocalFormer3D-TTA
16knuScenesmAOE0.32FocalFormer3D-TTA
16knuScenesmAP0.71FocalFormer3D-TTA
16knuScenesmASE0.24FocalFormer3D-TTA
16knuScenesmATE0.24FocalFormer3D-TTA
16knuScenesmAVE0.2FocalFormer3D-TTA
16knuScenesNDS0.73FocalFormer3D-L
16knuScenesmAAE0.13FocalFormer3D-L
16knuScenesmAOE0.34FocalFormer3D-L
16knuScenesmAP0.69FocalFormer3D-L
16knuScenesmASE0.24FocalFormer3D-L
16knuScenesmATE0.25FocalFormer3D-L
16knuScenesmAVE0.22FocalFormer3D-L

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17