TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Robust Multimodal 3D Object Detection via Modality-Agnosti...

Robust Multimodal 3D Object Detection via Modality-Agnostic Decoding and Proximity-based Modality Ensemble

Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim

2024-07-27object-detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Recent advancements in 3D object detection have benefited from multi-modal information from the multi-view cameras and LiDAR sensors. However, the inherent disparities between the modalities pose substantial challenges. We observe that existing multi-modal 3D object detection methods heavily rely on the LiDAR sensor, treating the camera as an auxiliary modality for augmenting semantic details. This often leads to not only underutilization of camera data but also significant performance degradation in scenarios where LiDAR data is unavailable. Additionally, existing fusion methods overlook the detrimental impact of sensor noise induced by environmental changes, on detection performance. In this paper, we propose MEFormer to address the LiDAR over-reliance problem by harnessing critical information for 3D object detection from every available modality while concurrently safeguarding against corrupted signals during the fusion process. Specifically, we introduce Modality Agnostic Decoding (MOAD) that extracts geometric and semantic features with a shared transformer decoder regardless of input modalities and provides promising improvement with a single modality as well as multi-modality. Additionally, our Proximity-based Modality Ensemble (PME) module adaptively utilizes the strengths of each modality depending on the environment while mitigating the effects of a noisy sensor. Our MEFormer achieves state-of-the-art performance of 73.9% NDS and 71.5% mAP in the nuScenes validation set. Extensive analyses validate that our MEFormer improves robustness against challenging conditions such as sensor malfunctions or environmental changes. The source code is available at https://github.com/hanchaa/MEFormer

Results

TaskDatasetMetricValueModel
Object DetectionnuScenesNDS0.74MEFormer
Object DetectionnuScenesmAAE0.11MEFormer
Object DetectionnuScenesmAOE0.3MEFormer
Object DetectionnuScenesmAP0.72MEFormer
Object DetectionnuScenesmASE0.24MEFormer
Object DetectionnuScenesmATE0.27MEFormer
Object DetectionnuScenesmAVE0.27MEFormer
3DnuScenesNDS0.74MEFormer
3DnuScenesmAAE0.11MEFormer
3DnuScenesmAOE0.3MEFormer
3DnuScenesmAP0.72MEFormer
3DnuScenesmASE0.24MEFormer
3DnuScenesmATE0.27MEFormer
3DnuScenesmAVE0.27MEFormer
3D Object DetectionnuScenesNDS0.74MEFormer
3D Object DetectionnuScenesmAAE0.11MEFormer
3D Object DetectionnuScenesmAOE0.3MEFormer
3D Object DetectionnuScenesmAP0.72MEFormer
3D Object DetectionnuScenesmASE0.24MEFormer
3D Object DetectionnuScenesmATE0.27MEFormer
3D Object DetectionnuScenesmAVE0.27MEFormer
2D ClassificationnuScenesNDS0.74MEFormer
2D ClassificationnuScenesmAAE0.11MEFormer
2D ClassificationnuScenesmAOE0.3MEFormer
2D ClassificationnuScenesmAP0.72MEFormer
2D ClassificationnuScenesmASE0.24MEFormer
2D ClassificationnuScenesmATE0.27MEFormer
2D ClassificationnuScenesmAVE0.27MEFormer
2D Object DetectionnuScenesNDS0.74MEFormer
2D Object DetectionnuScenesmAAE0.11MEFormer
2D Object DetectionnuScenesmAOE0.3MEFormer
2D Object DetectionnuScenesmAP0.72MEFormer
2D Object DetectionnuScenesmASE0.24MEFormer
2D Object DetectionnuScenesmATE0.27MEFormer
2D Object DetectionnuScenesmAVE0.27MEFormer
16knuScenesNDS0.74MEFormer
16knuScenesmAAE0.11MEFormer
16knuScenesmAOE0.3MEFormer
16knuScenesmAP0.72MEFormer
16knuScenesmASE0.24MEFormer
16knuScenesmATE0.27MEFormer
16knuScenesmAVE0.27MEFormer

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08Beyond One Shot, Beyond One Perspective: Cross-View and Long-Horizon Distillation for Better LiDAR Representations2025-07-07