TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multiview Detection with Shadow Transformer (and View-Cohe...

Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)

Yunzhong Hou, Liang Zheng

2021-08-12Multiview DetectionData AugmentationTranslation
PaperPDFCode(official)

Abstract

Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.

Results

TaskDatasetMetricValueModel
Object DetectionWildtrackMODA91.5MVDeTr
Object DetectionWildtrackMODP82.1MVDeTr
Object DetectionWildtrackRecall94MVDeTr
Object DetectionCityStreetF1_score (2m)75.2MVDeTr
Object DetectionCityStreetMODA (2m)58.3MVDeTr
Object DetectionCityStreetMODP (2m)74.1MVDeTr
Object DetectionCityStreetPrecision (2m)92.8MVDeTr
Object DetectionCityStreetRecall (2m)63.2MVDeTr
Object DetectionCVCSF1_score (1m)61MVDeTr
Object DetectionCVCSMODA (1m)39.8MVDeTr
Object DetectionCVCSMODP (1m)84.1MVDeTr
Object DetectionCVCSPrecision (1m)95.3MVDeTr
Object DetectionCVCSRecall (1m)44.9MVDeTr
Object DetectionMultiviewXMODA93.7MVDeTr
Object DetectionMultiviewXMODP91.3MVDeTr
Object DetectionMultiviewXRecall94.2MVDeTr
3DWildtrackMODA91.5MVDeTr
3DWildtrackMODP82.1MVDeTr
3DWildtrackRecall94MVDeTr
3DCityStreetF1_score (2m)75.2MVDeTr
3DCityStreetMODA (2m)58.3MVDeTr
3DCityStreetMODP (2m)74.1MVDeTr
3DCityStreetPrecision (2m)92.8MVDeTr
3DCityStreetRecall (2m)63.2MVDeTr
3DCVCSF1_score (1m)61MVDeTr
3DCVCSMODA (1m)39.8MVDeTr
3DCVCSMODP (1m)84.1MVDeTr
3DCVCSPrecision (1m)95.3MVDeTr
3DCVCSRecall (1m)44.9MVDeTr
3DMultiviewXMODA93.7MVDeTr
3DMultiviewXMODP91.3MVDeTr
3DMultiviewXRecall94.2MVDeTr
3D Object DetectionWildtrackMODA91.5MVDeTr
3D Object DetectionWildtrackMODP82.1MVDeTr
3D Object DetectionWildtrackRecall94MVDeTr
3D Object DetectionCityStreetF1_score (2m)75.2MVDeTr
3D Object DetectionCityStreetMODA (2m)58.3MVDeTr
3D Object DetectionCityStreetMODP (2m)74.1MVDeTr
3D Object DetectionCityStreetPrecision (2m)92.8MVDeTr
3D Object DetectionCityStreetRecall (2m)63.2MVDeTr
3D Object DetectionCVCSF1_score (1m)61MVDeTr
3D Object DetectionCVCSMODA (1m)39.8MVDeTr
3D Object DetectionCVCSMODP (1m)84.1MVDeTr
3D Object DetectionCVCSPrecision (1m)95.3MVDeTr
3D Object DetectionCVCSRecall (1m)44.9MVDeTr
3D Object DetectionMultiviewXMODA93.7MVDeTr
3D Object DetectionMultiviewXMODP91.3MVDeTr
3D Object DetectionMultiviewXRecall94.2MVDeTr
2D ClassificationWildtrackMODA91.5MVDeTr
2D ClassificationWildtrackMODP82.1MVDeTr
2D ClassificationWildtrackRecall94MVDeTr
2D ClassificationCityStreetF1_score (2m)75.2MVDeTr
2D ClassificationCityStreetMODA (2m)58.3MVDeTr
2D ClassificationCityStreetMODP (2m)74.1MVDeTr
2D ClassificationCityStreetPrecision (2m)92.8MVDeTr
2D ClassificationCityStreetRecall (2m)63.2MVDeTr
2D ClassificationCVCSF1_score (1m)61MVDeTr
2D ClassificationCVCSMODA (1m)39.8MVDeTr
2D ClassificationCVCSMODP (1m)84.1MVDeTr
2D ClassificationCVCSPrecision (1m)95.3MVDeTr
2D ClassificationCVCSRecall (1m)44.9MVDeTr
2D ClassificationMultiviewXMODA93.7MVDeTr
2D ClassificationMultiviewXMODP91.3MVDeTr
2D ClassificationMultiviewXRecall94.2MVDeTr
2D Object DetectionWildtrackMODA91.5MVDeTr
2D Object DetectionWildtrackMODP82.1MVDeTr
2D Object DetectionWildtrackRecall94MVDeTr
2D Object DetectionCityStreetF1_score (2m)75.2MVDeTr
2D Object DetectionCityStreetMODA (2m)58.3MVDeTr
2D Object DetectionCityStreetMODP (2m)74.1MVDeTr
2D Object DetectionCityStreetPrecision (2m)92.8MVDeTr
2D Object DetectionCityStreetRecall (2m)63.2MVDeTr
2D Object DetectionCVCSF1_score (1m)61MVDeTr
2D Object DetectionCVCSMODA (1m)39.8MVDeTr
2D Object DetectionCVCSMODP (1m)84.1MVDeTr
2D Object DetectionCVCSPrecision (1m)95.3MVDeTr
2D Object DetectionCVCSRecall (1m)44.9MVDeTr
2D Object DetectionMultiviewXMODA93.7MVDeTr
2D Object DetectionMultiviewXMODP91.3MVDeTr
2D Object DetectionMultiviewXRecall94.2MVDeTr
16kWildtrackMODA91.5MVDeTr
16kWildtrackMODP82.1MVDeTr
16kWildtrackRecall94MVDeTr
16kCityStreetF1_score (2m)75.2MVDeTr
16kCityStreetMODA (2m)58.3MVDeTr
16kCityStreetMODP (2m)74.1MVDeTr
16kCityStreetPrecision (2m)92.8MVDeTr
16kCityStreetRecall (2m)63.2MVDeTr
16kCVCSF1_score (1m)61MVDeTr
16kCVCSMODA (1m)39.8MVDeTr
16kCVCSMODP (1m)84.1MVDeTr
16kCVCSPrecision (1m)95.3MVDeTr
16kCVCSRecall (1m)44.9MVDeTr
16kMultiviewXMODA93.7MVDeTr
16kMultiviewXMODP91.3MVDeTr
16kMultiviewXRecall94.2MVDeTr

Related Papers

Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17Pixel Perfect MegaMed: A Megapixel-Scale Vision-Language Foundation Model for Generating High Resolution Medical Images2025-07-17A Translation of Probabilistic Event Calculus into Markov Decision Processes2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Function-to-Style Guidance of LLMs for Code Translation2025-07-15Iceberg: Enhancing HLS Modeling with Synthetic Data2025-07-14AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13