Qi Zhang, Yunfei Gong, Daijie Chen, Antoni B. Chan, Hui Huang
Recent deep learning-based multi-view people detection (MVD) methods have shown promising results on existing datasets. However, current methods are mainly trained and evaluated on small, single scenes with a limited number of multi-view frames and fixed camera views. As a result, these methods may not be practical for detecting people in larger, more complex scenes with severe occlusions and camera calibration errors. This paper focuses on improving multi-view people detection by developing a supervised view-wise contribution weighting approach that better fuses multi-camera information under large scenes. Besides, a large synthetic dataset is adopted to enhance the model's generalization ability and enable more practical evaluation and comparison. The model's performance on new testing scenes is further improved with a simple domain adaptation technique. Experimental results demonstrate the effectiveness of our approach in achieving promising cross-scene multi-view people detection performance. See code here: https://vcc.tech/research/2024/MVD.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Object Detection | CityStreet | F1_score (2m) | 76 | SVCW |
| Object Detection | CityStreet | MODA (2m) | 55 | SVCW |
| Object Detection | CityStreet | MODP (2m) | 70 | SVCW |
| Object Detection | CityStreet | Precision (2m) | 81.4 | SVCW |
| Object Detection | CityStreet | Recall (2m) | 71.2 | SVCW |
| Object Detection | CVCS | F1_score (1m) | 68.4 | SVCW |
| Object Detection | CVCS | MODA (1m) | 46.2 | SVCW |
| Object Detection | CVCS | MODP (1m) | 78.4 | SVCW |
| Object Detection | CVCS | Precision (1m) | 81.2 | SVCW |
| Object Detection | CVCS | Recall (1m) | 59.1 | SVCW |
| 3D | CityStreet | F1_score (2m) | 76 | SVCW |
| 3D | CityStreet | MODA (2m) | 55 | SVCW |
| 3D | CityStreet | MODP (2m) | 70 | SVCW |
| 3D | CityStreet | Precision (2m) | 81.4 | SVCW |
| 3D | CityStreet | Recall (2m) | 71.2 | SVCW |
| 3D | CVCS | F1_score (1m) | 68.4 | SVCW |
| 3D | CVCS | MODA (1m) | 46.2 | SVCW |
| 3D | CVCS | MODP (1m) | 78.4 | SVCW |
| 3D | CVCS | Precision (1m) | 81.2 | SVCW |
| 3D | CVCS | Recall (1m) | 59.1 | SVCW |
| 3D Object Detection | CityStreet | F1_score (2m) | 76 | SVCW |
| 3D Object Detection | CityStreet | MODA (2m) | 55 | SVCW |
| 3D Object Detection | CityStreet | MODP (2m) | 70 | SVCW |
| 3D Object Detection | CityStreet | Precision (2m) | 81.4 | SVCW |
| 3D Object Detection | CityStreet | Recall (2m) | 71.2 | SVCW |
| 3D Object Detection | CVCS | F1_score (1m) | 68.4 | SVCW |
| 3D Object Detection | CVCS | MODA (1m) | 46.2 | SVCW |
| 3D Object Detection | CVCS | MODP (1m) | 78.4 | SVCW |
| 3D Object Detection | CVCS | Precision (1m) | 81.2 | SVCW |
| 3D Object Detection | CVCS | Recall (1m) | 59.1 | SVCW |
| 2D Classification | CityStreet | F1_score (2m) | 76 | SVCW |
| 2D Classification | CityStreet | MODA (2m) | 55 | SVCW |
| 2D Classification | CityStreet | MODP (2m) | 70 | SVCW |
| 2D Classification | CityStreet | Precision (2m) | 81.4 | SVCW |
| 2D Classification | CityStreet | Recall (2m) | 71.2 | SVCW |
| 2D Classification | CVCS | F1_score (1m) | 68.4 | SVCW |
| 2D Classification | CVCS | MODA (1m) | 46.2 | SVCW |
| 2D Classification | CVCS | MODP (1m) | 78.4 | SVCW |
| 2D Classification | CVCS | Precision (1m) | 81.2 | SVCW |
| 2D Classification | CVCS | Recall (1m) | 59.1 | SVCW |
| 2D Object Detection | CityStreet | F1_score (2m) | 76 | SVCW |
| 2D Object Detection | CityStreet | MODA (2m) | 55 | SVCW |
| 2D Object Detection | CityStreet | MODP (2m) | 70 | SVCW |
| 2D Object Detection | CityStreet | Precision (2m) | 81.4 | SVCW |
| 2D Object Detection | CityStreet | Recall (2m) | 71.2 | SVCW |
| 2D Object Detection | CVCS | F1_score (1m) | 68.4 | SVCW |
| 2D Object Detection | CVCS | MODA (1m) | 46.2 | SVCW |
| 2D Object Detection | CVCS | MODP (1m) | 78.4 | SVCW |
| 2D Object Detection | CVCS | Precision (1m) | 81.2 | SVCW |
| 2D Object Detection | CVCS | Recall (1m) | 59.1 | SVCW |
| 16k | CityStreet | F1_score (2m) | 76 | SVCW |
| 16k | CityStreet | MODA (2m) | 55 | SVCW |
| 16k | CityStreet | MODP (2m) | 70 | SVCW |
| 16k | CityStreet | Precision (2m) | 81.4 | SVCW |
| 16k | CityStreet | Recall (2m) | 71.2 | SVCW |
| 16k | CVCS | F1_score (1m) | 68.4 | SVCW |
| 16k | CVCS | MODA (1m) | 46.2 | SVCW |
| 16k | CVCS | MODP (1m) | 78.4 | SVCW |
| 16k | CVCS | Precision (1m) | 81.2 | SVCW |
| 16k | CVCS | Recall (1m) | 59.1 | SVCW |