TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/3D Random Occlusion and Multi-Layer Projection for Deep Mu...

3D Random Occlusion and Multi-Layer Projection for Deep Multi-Camera Pedestrian Localization

Rui Qiu, Ming Xu, Yuyao Yan, Jeremy S. Smith, Xi Yang

2022-07-22Multiview DetectionData AugmentationPedestrian Detection
PaperPDFCode(official)

Abstract

Although deep-learning based methods for monocular pedestrian detection have made great progress, they are still vulnerable to heavy occlusions. Using multi-view information fusion is a potential solution but has limited applications, due to the lack of annotated training samples in existing multi-view datasets, which increases the risk of overfitting. To address this problem, a data augmentation method is proposed to randomly generate 3D cylinder occlusions, on the ground plane, which are of the average size of pedestrians and projected to multiple views, to relieve the impact of overfitting in the training. Moreover, the feature map of each view is projected to multiple parallel planes at different heights, by using homographies, which allows the CNNs to fully utilize the features across the height of each pedestrian to infer the locations of pedestrians on the ground plane. The proposed 3DROM method has a greatly improved performance in comparison with the state-of-the-art deep-learning based methods for multi-view pedestrian detection.

Results

TaskDatasetMetricValueModel
Object DetectionWildtrackMODA93.53DROM
Object DetectionWildtrackMODP75.93DROM
Object DetectionWildtrackRecall96.23DROM
Object DetectionCityStreetF1_score (2m)79.23DROM
Object DetectionCityStreetMODA (2m)603DROM
Object DetectionCityStreetMODP (2m)70.13DROM
Object DetectionCityStreetPrecision (2m)82.53DROM
Object DetectionCityStreetRecall (2m)76.23DROM
Object DetectionCVCSF1_score (1m)55.13DROM
Object DetectionCVCSMODA (1m)33.93DROM
Object DetectionCVCSMODP (1m)73.93DROM
Object DetectionCVCSPrecision (1m)79.53DROM
Object DetectionCVCSRecall (1m)42.23DROM
Object DetectionMultiviewXMODA903DROM
Object DetectionMultiviewXMODP83.73DROM
3DWildtrackMODA93.53DROM
3DWildtrackMODP75.93DROM
3DWildtrackRecall96.23DROM
3DCityStreetF1_score (2m)79.23DROM
3DCityStreetMODA (2m)603DROM
3DCityStreetMODP (2m)70.13DROM
3DCityStreetPrecision (2m)82.53DROM
3DCityStreetRecall (2m)76.23DROM
3DCVCSF1_score (1m)55.13DROM
3DCVCSMODA (1m)33.93DROM
3DCVCSMODP (1m)73.93DROM
3DCVCSPrecision (1m)79.53DROM
3DCVCSRecall (1m)42.23DROM
3DMultiviewXMODA903DROM
3DMultiviewXMODP83.73DROM
3D Object DetectionWildtrackMODA93.53DROM
3D Object DetectionWildtrackMODP75.93DROM
3D Object DetectionWildtrackRecall96.23DROM
3D Object DetectionCityStreetF1_score (2m)79.23DROM
3D Object DetectionCityStreetMODA (2m)603DROM
3D Object DetectionCityStreetMODP (2m)70.13DROM
3D Object DetectionCityStreetPrecision (2m)82.53DROM
3D Object DetectionCityStreetRecall (2m)76.23DROM
3D Object DetectionCVCSF1_score (1m)55.13DROM
3D Object DetectionCVCSMODA (1m)33.93DROM
3D Object DetectionCVCSMODP (1m)73.93DROM
3D Object DetectionCVCSPrecision (1m)79.53DROM
3D Object DetectionCVCSRecall (1m)42.23DROM
3D Object DetectionMultiviewXMODA903DROM
3D Object DetectionMultiviewXMODP83.73DROM
2D ClassificationWildtrackMODA93.53DROM
2D ClassificationWildtrackMODP75.93DROM
2D ClassificationWildtrackRecall96.23DROM
2D ClassificationCityStreetF1_score (2m)79.23DROM
2D ClassificationCityStreetMODA (2m)603DROM
2D ClassificationCityStreetMODP (2m)70.13DROM
2D ClassificationCityStreetPrecision (2m)82.53DROM
2D ClassificationCityStreetRecall (2m)76.23DROM
2D ClassificationCVCSF1_score (1m)55.13DROM
2D ClassificationCVCSMODA (1m)33.93DROM
2D ClassificationCVCSMODP (1m)73.93DROM
2D ClassificationCVCSPrecision (1m)79.53DROM
2D ClassificationCVCSRecall (1m)42.23DROM
2D ClassificationMultiviewXMODA903DROM
2D ClassificationMultiviewXMODP83.73DROM
2D Object DetectionWildtrackMODA93.53DROM
2D Object DetectionWildtrackMODP75.93DROM
2D Object DetectionWildtrackRecall96.23DROM
2D Object DetectionCityStreetF1_score (2m)79.23DROM
2D Object DetectionCityStreetMODA (2m)603DROM
2D Object DetectionCityStreetMODP (2m)70.13DROM
2D Object DetectionCityStreetPrecision (2m)82.53DROM
2D Object DetectionCityStreetRecall (2m)76.23DROM
2D Object DetectionCVCSF1_score (1m)55.13DROM
2D Object DetectionCVCSMODA (1m)33.93DROM
2D Object DetectionCVCSMODP (1m)73.93DROM
2D Object DetectionCVCSPrecision (1m)79.53DROM
2D Object DetectionCVCSRecall (1m)42.23DROM
2D Object DetectionMultiviewXMODA903DROM
2D Object DetectionMultiviewXMODP83.73DROM
16kWildtrackMODA93.53DROM
16kWildtrackMODP75.93DROM
16kWildtrackRecall96.23DROM
16kCityStreetF1_score (2m)79.23DROM
16kCityStreetMODA (2m)603DROM
16kCityStreetMODP (2m)70.13DROM
16kCityStreetPrecision (2m)82.53DROM
16kCityStreetRecall (2m)76.23DROM
16kCVCSF1_score (1m)55.13DROM
16kCVCSMODA (1m)33.93DROM
16kCVCSMODP (1m)73.93DROM
16kCVCSPrecision (1m)79.53DROM
16kCVCSRecall (1m)42.23DROM
16kMultiviewXMODA903DROM
16kMultiviewXMODP83.73DROM

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13FreeAudio: Training-Free Timing Planning for Controllable Long-Form Text-to-Audio Generation2025-07-11DS@GT at CheckThat! 2025: Detecting Subjectivity via Transfer-Learning and Corrective Data Augmentation2025-07-08