TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/When Pedestrian Detection Meets Multi-Modal Learning: Gene...

When Pedestrian Detection Meets Multi-Modal Learning: Generalist Model and Benchmark Dataset

Yi Zhang, Wang Zeng, Sheng Jin, Chen Qian, Ping Luo, Wentao Liu

2024-07-14Multispectral Object DetectionPedestrian Detection3D Object DetectionObject Detection
PaperPDFCode(official)

Abstract

Recent years have witnessed increasing research attention towards pedestrian detection by taking the advantages of different sensor modalities (e.g. RGB, IR, Depth, LiDAR and Event). However, designing a unified generalist model that can effectively process diverse sensor modalities remains a challenge. This paper introduces MMPedestron, a novel generalist model for multimodal perception. Unlike previous specialist models that only process one or a pair of specific modality inputs, MMPedestron is able to process multiple modal inputs and their dynamic combinations. The proposed approach comprises a unified encoder for modal representation and fusion and a general head for pedestrian detection. We introduce two extra learnable tokens, i.e. MAA and MAF, for adaptive multi-modal feature fusion. In addition, we construct the MMPD dataset, the first large-scale benchmark for multi-modal pedestrian detection. This benchmark incorporates existing public datasets and a newly collected dataset called EventPed, covering a wide range of sensor modalities including RGB, IR, Depth, LiDAR, and Event data. With multi-modal joint training, our model achieves state-of-the-art performance on a wide range of pedestrian detection benchmarks, surpassing leading models tailored for specific sensor modality. For example, it achieves 71.1 AP on COCO-Persons and 72.6 AP on LLVIP. Notably, our model achieves comparable performance to the InternImage-H model on CrowdHuman with 30x smaller parameters. Codes and data are available at https://github.com/BubblyYi/MMPedestron.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesMMPD-Datasetbox mAP79MMPedestron
Autonomous VehiclesLLVIPAP0.726MMPedestron
Object DetectionCrowdHuman (full body)AP97.1MMPedestron
Object DetectionCrowdHuman (full body)mMR30.8MMPedestron
Object DetectionInOutDoor AP65.7MMPedestron
Object DetectionEventPedAP79MMPedestron
Object DetectionSTCrowdAP74.9MMPedestron
3DCrowdHuman (full body)AP97.1MMPedestron
3DCrowdHuman (full body)mMR30.8MMPedestron
3DInOutDoor AP65.7MMPedestron
3DEventPedAP79MMPedestron
3DSTCrowdAP74.9MMPedestron
2D ClassificationCrowdHuman (full body)AP97.1MMPedestron
2D ClassificationCrowdHuman (full body)mMR30.8MMPedestron
2D ClassificationInOutDoor AP65.7MMPedestron
2D ClassificationEventPedAP79MMPedestron
2D ClassificationSTCrowdAP74.9MMPedestron
Pedestrian DetectionMMPD-Datasetbox mAP79MMPedestron
Pedestrian DetectionLLVIPAP0.726MMPedestron
2D Object DetectionCrowdHuman (full body)AP97.1MMPedestron
2D Object DetectionCrowdHuman (full body)mMR30.8MMPedestron
2D Object DetectionInOutDoor AP65.7MMPedestron
2D Object DetectionEventPedAP79MMPedestron
2D Object DetectionSTCrowdAP74.9MMPedestron
16kCrowdHuman (full body)AP97.1MMPedestron
16kCrowdHuman (full body)mMR30.8MMPedestron
16kInOutDoor AP65.7MMPedestron
16kEventPedAP79MMPedestron
16kSTCrowdAP74.9MMPedestron

Related Papers

Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Vision-based Perception for Autonomous Vehicles in Obstacle Avoidance Scenarios2025-07-16Tomato Multi-Angle Multi-Pose Dataset for Fine-Grained Phenotyping2025-07-15ECORE: Energy-Conscious Optimized Routing for Deep Learning Models at the Edge2025-07-08YOLO-APD: Enhancing YOLOv8 for Robust Pedestrian Detection on Complex Road Geometries2025-07-07