Yunzhong Hou, Liang Zheng
Multiview detection incorporates multiple camera views to deal with occlusions, and its central problem is multiview aggregation. Given feature map projections from multiple views onto a common ground plane, the state-of-the-art method addresses this problem via convolution, which applies the same calculation regardless of object locations. However, such translation-invariant behaviors might not be the best choice, as object features undergo various projection distortions according to their positions and cameras. In this paper, we propose a novel multiview detector, MVDeTr, that adopts a newly introduced shadow transformer to aggregate multiview information. Unlike convolutions, shadow transformer attends differently at different positions and cameras to deal with various shadow-like distortions. We propose an effective training scheme that includes a new view-coherent data augmentation method, which applies random augmentations while maintaining multiview consistency. On two multiview detection benchmarks, we report new state-of-the-art accuracy with the proposed system. Code is available at https://github.com/hou-yz/MVDeTr.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | Wildtrack | MODA | 91.5 | MVDeTr |
| Object Detection | Wildtrack | MODP | 82.1 | MVDeTr |
| Object Detection | Wildtrack | Recall | 94 | MVDeTr |
| Object Detection | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| Object Detection | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| Object Detection | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| Object Detection | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| Object Detection | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| Object Detection | CVCS | F1_score (1m) | 61 | MVDeTr |
| Object Detection | CVCS | MODA (1m) | 39.8 | MVDeTr |
| Object Detection | CVCS | MODP (1m) | 84.1 | MVDeTr |
| Object Detection | CVCS | Precision (1m) | 95.3 | MVDeTr |
| Object Detection | CVCS | Recall (1m) | 44.9 | MVDeTr |
| Object Detection | MultiviewX | MODA | 93.7 | MVDeTr |
| Object Detection | MultiviewX | MODP | 91.3 | MVDeTr |
| Object Detection | MultiviewX | Recall | 94.2 | MVDeTr |
| 3D | Wildtrack | MODA | 91.5 | MVDeTr |
| 3D | Wildtrack | MODP | 82.1 | MVDeTr |
| 3D | Wildtrack | Recall | 94 | MVDeTr |
| 3D | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| 3D | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| 3D | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| 3D | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| 3D | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| 3D | CVCS | F1_score (1m) | 61 | MVDeTr |
| 3D | CVCS | MODA (1m) | 39.8 | MVDeTr |
| 3D | CVCS | MODP (1m) | 84.1 | MVDeTr |
| 3D | CVCS | Precision (1m) | 95.3 | MVDeTr |
| 3D | CVCS | Recall (1m) | 44.9 | MVDeTr |
| 3D | MultiviewX | MODA | 93.7 | MVDeTr |
| 3D | MultiviewX | MODP | 91.3 | MVDeTr |
| 3D | MultiviewX | Recall | 94.2 | MVDeTr |
| 3D Object Detection | Wildtrack | MODA | 91.5 | MVDeTr |
| 3D Object Detection | Wildtrack | MODP | 82.1 | MVDeTr |
| 3D Object Detection | Wildtrack | Recall | 94 | MVDeTr |
| 3D Object Detection | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| 3D Object Detection | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| 3D Object Detection | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| 3D Object Detection | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| 3D Object Detection | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| 3D Object Detection | CVCS | F1_score (1m) | 61 | MVDeTr |
| 3D Object Detection | CVCS | MODA (1m) | 39.8 | MVDeTr |
| 3D Object Detection | CVCS | MODP (1m) | 84.1 | MVDeTr |
| 3D Object Detection | CVCS | Precision (1m) | 95.3 | MVDeTr |
| 3D Object Detection | CVCS | Recall (1m) | 44.9 | MVDeTr |
| 3D Object Detection | MultiviewX | MODA | 93.7 | MVDeTr |
| 3D Object Detection | MultiviewX | MODP | 91.3 | MVDeTr |
| 3D Object Detection | MultiviewX | Recall | 94.2 | MVDeTr |
| 2D Classification | Wildtrack | MODA | 91.5 | MVDeTr |
| 2D Classification | Wildtrack | MODP | 82.1 | MVDeTr |
| 2D Classification | Wildtrack | Recall | 94 | MVDeTr |
| 2D Classification | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| 2D Classification | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| 2D Classification | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| 2D Classification | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| 2D Classification | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| 2D Classification | CVCS | F1_score (1m) | 61 | MVDeTr |
| 2D Classification | CVCS | MODA (1m) | 39.8 | MVDeTr |
| 2D Classification | CVCS | MODP (1m) | 84.1 | MVDeTr |
| 2D Classification | CVCS | Precision (1m) | 95.3 | MVDeTr |
| 2D Classification | CVCS | Recall (1m) | 44.9 | MVDeTr |
| 2D Classification | MultiviewX | MODA | 93.7 | MVDeTr |
| 2D Classification | MultiviewX | MODP | 91.3 | MVDeTr |
| 2D Classification | MultiviewX | Recall | 94.2 | MVDeTr |
| 2D Object Detection | Wildtrack | MODA | 91.5 | MVDeTr |
| 2D Object Detection | Wildtrack | MODP | 82.1 | MVDeTr |
| 2D Object Detection | Wildtrack | Recall | 94 | MVDeTr |
| 2D Object Detection | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| 2D Object Detection | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| 2D Object Detection | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| 2D Object Detection | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| 2D Object Detection | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| 2D Object Detection | CVCS | F1_score (1m) | 61 | MVDeTr |
| 2D Object Detection | CVCS | MODA (1m) | 39.8 | MVDeTr |
| 2D Object Detection | CVCS | MODP (1m) | 84.1 | MVDeTr |
| 2D Object Detection | CVCS | Precision (1m) | 95.3 | MVDeTr |
| 2D Object Detection | CVCS | Recall (1m) | 44.9 | MVDeTr |
| 2D Object Detection | MultiviewX | MODA | 93.7 | MVDeTr |
| 2D Object Detection | MultiviewX | MODP | 91.3 | MVDeTr |
| 2D Object Detection | MultiviewX | Recall | 94.2 | MVDeTr |
| 16k | Wildtrack | MODA | 91.5 | MVDeTr |
| 16k | Wildtrack | MODP | 82.1 | MVDeTr |
| 16k | Wildtrack | Recall | 94 | MVDeTr |
| 16k | CityStreet | F1_score (2m) | 75.2 | MVDeTr |
| 16k | CityStreet | MODA (2m) | 58.3 | MVDeTr |
| 16k | CityStreet | MODP (2m) | 74.1 | MVDeTr |
| 16k | CityStreet | Precision (2m) | 92.8 | MVDeTr |
| 16k | CityStreet | Recall (2m) | 63.2 | MVDeTr |
| 16k | CVCS | F1_score (1m) | 61 | MVDeTr |
| 16k | CVCS | MODA (1m) | 39.8 | MVDeTr |
| 16k | CVCS | MODP (1m) | 84.1 | MVDeTr |
| 16k | CVCS | Precision (1m) | 95.3 | MVDeTr |
| 16k | CVCS | Recall (1m) | 44.9 | MVDeTr |
| 16k | MultiviewX | MODA | 93.7 | MVDeTr |
| 16k | MultiviewX | MODP | 91.3 | MVDeTr |
| 16k | MultiviewX | Recall | 94.2 | MVDeTr |