Zehua Cheng, Yuxiang Wu, Zhenghua Xu, Thomas Lukasiewicz, Weiyang Wang
Region proposal mechanisms are essential for existing deep learning approaches to object detection in images. Although they can generally achieve a good detection performance under normal circumstances, their recall in a scene with extreme cases is unacceptably low. This is mainly because bounding box annotations contain much environment noise information, and non-maximum suppression (NMS) is required to select target boxes. Therefore, in this paper, we propose the first anchor-free and NMS-free object detection model called weakly supervised multimodal annotation segmentation (WSMA-Seg), which utilizes segmentation models to achieve an accurate and robust object detection without NMS. In WSMA-Seg, multimodal annotations are proposed to achieve an instance-aware segmentation using weakly supervised bounding boxes; we also develop a run-data-based following algorithm to trace contours of objects. In addition, we propose a multi-scale pooling segmentation (MSP-Seg) as the underlying segmentation model of WSMA-Seg to achieve a more accurate segmentation and to enhance the detection accuracy of WSMA-Seg. Experimental results on multiple datasets show that the proposed WSMA-Seg approach outperforms the state-of-the-art detectors.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Facial Recognition and Modelling | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| Facial Recognition and Modelling | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| Object Detection | COCO test-dev | box mAP | 38.1 | WSMA-Seg |
| Face Detection | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| Face Detection | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| Face Reconstruction | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| Face Reconstruction | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| 3D | COCO test-dev | box mAP | 38.1 | WSMA-Seg |
| 3D | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| 3D | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| 3D Face Modelling | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| 3D Face Modelling | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| 3D Face Reconstruction | WIDER Face (Medium) | AP | 0.9341 | WSMA-Seg |
| 3D Face Reconstruction | WIDER Face (Hard) | AP | 0.8723 | WSMA-Seg |
| 2D Classification | COCO test-dev | box mAP | 38.1 | WSMA-Seg |
| 2D Object Detection | COCO test-dev | box mAP | 38.1 | WSMA-Seg |
| 16k | COCO test-dev | box mAP | 38.1 | WSMA-Seg |