TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mahalanobis Distance-based Multi-view Optimal Transport fo...

Mahalanobis Distance-based Multi-view Optimal Transport for Multi-view Crowd Localization

Qi Zhang, Kaiyi Zhang, Antoni B. Chan, Hui Huang

2024-09-03Multiview Detection
PaperPDFCode(official)

Abstract

Multi-view crowd localization predicts the ground locations of all people in the scene. Typical methods usually estimate the crowd density maps on the ground plane first, and then obtain the crowd locations. However, the performance of existing methods is limited by the ambiguity of the density maps in crowded areas, where local peaks can be smoothed away. To mitigate the weakness of density map supervision, optimal transport-based point supervision methods have been proposed in the single-image crowd localization tasks, but have not been explored for multi-view crowd localization yet. Thus, in this paper, we propose a novel Mahalanobis distance-based multi-view optimal transport (M-MVOT) loss specifically designed for multi-view crowd localization. First, we replace the Euclidean-based transport cost with the Mahalanobis distance, which defines elliptical iso-contours in the cost function whose long-axis and short-axis directions are guided by the view ray direction. Second, the object-to-camera distance in each view is used to adjust the optimal transport cost of each location further, where the wrong predictions far away from the camera are more heavily penalized. Finally, we propose a strategy to consider all the input camera views in the model loss (M-MVOT) by computing the optimal transport cost for each ground-truth point based on its closest camera. Experiments demonstrate the advantage of the proposed method over density map-based or common Euclidean distance-based optimal transport loss on several multi-view crowd localization datasets. Project page: https://vcc.tech/research/2024/MVOT.

Results

TaskDatasetMetricValueModel
Object DetectionWildtrackMODA92.1M-MVOT
Object DetectionCVCSMODA (0.5m)43.5M-MVOT
Object DetectionMultiviewXMODA96.7M-MVOT
Object DetectionMultiviewXMODP86.1M-MVOT
Object DetectionMultiviewXRecall97.9M-MVOT
3DWildtrackMODA92.1M-MVOT
3DCVCSMODA (0.5m)43.5M-MVOT
3DMultiviewXMODA96.7M-MVOT
3DMultiviewXMODP86.1M-MVOT
3DMultiviewXRecall97.9M-MVOT
3D Object DetectionWildtrackMODA92.1M-MVOT
3D Object DetectionCVCSMODA (0.5m)43.5M-MVOT
3D Object DetectionMultiviewXMODA96.7M-MVOT
3D Object DetectionMultiviewXMODP86.1M-MVOT
3D Object DetectionMultiviewXRecall97.9M-MVOT
2D ClassificationWildtrackMODA92.1M-MVOT
2D ClassificationCVCSMODA (0.5m)43.5M-MVOT
2D ClassificationMultiviewXMODA96.7M-MVOT
2D ClassificationMultiviewXMODP86.1M-MVOT
2D ClassificationMultiviewXRecall97.9M-MVOT
2D Object DetectionWildtrackMODA92.1M-MVOT
2D Object DetectionCVCSMODA (0.5m)43.5M-MVOT
2D Object DetectionMultiviewXMODA96.7M-MVOT
2D Object DetectionMultiviewXMODP86.1M-MVOT
2D Object DetectionMultiviewXRecall97.9M-MVOT
16kWildtrackMODA92.1M-MVOT
16kCVCSMODA (0.5m)43.5M-MVOT
16kMultiviewXMODA96.7M-MVOT
16kMultiviewXMODP86.1M-MVOT
16kMultiviewXRecall97.9M-MVOT

Related Papers

Resource-Efficient Multiview Perception: Integrating Semantic Masking with Masked Autoencoders2024-10-07Multi-View People Detection in Large Scenes via Supervised View-Wise Contribution Weighting2024-05-30Lifting Multi-View Detection and Tracking to the Bird's Eye View2024-03-19Enhancing Multi-View Pedestrian Detection Through Generalized 3D Feature Pulling2023-12-20EarlyBird: Early-Fusion for Multi-View Tracking in the Bird's Eye View2023-10-20Leveraging Multi-view Data for Improved Detection Performance: An Industrial Use Case2023-04-17Multi-view Tracking Using Weakly Supervised Human Motion Prediction2022-10-19Booster-SHOT: Boosting Stacked Homography Transformations for Multiview Pedestrian Detection with Attention2022-08-19