Robust and efficient post-processing for video object detection

Alberto Sabater, Luis Montesano, Ana C. Murillo

2020-09-23Video Object Detection Object Recognition Autonomous Driving object-detection Object Detection

Abstract

Object recognition in video is an important task for plenty of applications, including autonomous driving perception, surveillance tasks, wearable devices or IoT networks. Object recognition using video data is more challenging than using still images due to blur, occlusions or rare object poses. Specific video detectors with high computational cost or standard image detectors together with a fast post-processing algorithm achieve the current state-of-the-art. This work introduces a novel post-processing pipeline that overcomes some of the limitations of previous post-processing methods by introducing a learning-based similarity evaluation between detections across frames. Our method improves the results of state-of-the-art specific video detectors, specially regarding fast moving objects, and presents low resource requirements. And applied to efficient still image detectors, such as YOLO, provides comparable results to much more computationally intensive detectors.

Results

Task	Dataset	Metric	Value	Model
Object Detection	ImageNet VID	MAP	84.2	REPP + SELSA (ResNet-101)
Object Detection	ImageNet VID	MAP	80.1	REPP + FGFA
Object Detection	ImageNet VID	MAP	75.1	REPP + YOLOv3
Object Detection	ImageNet VID	MAP	68.6	YOLOv3
3D	ImageNet VID	MAP	84.2	REPP + SELSA (ResNet-101)
3D	ImageNet VID	MAP	80.1	REPP + FGFA
3D	ImageNet VID	MAP	75.1	REPP + YOLOv3
3D	ImageNet VID	MAP	68.6	YOLOv3
2D Classification	ImageNet VID	MAP	84.2	REPP + SELSA (ResNet-101)
2D Classification	ImageNet VID	MAP	80.1	REPP + FGFA
2D Classification	ImageNet VID	MAP	75.1	REPP + YOLOv3
2D Classification	ImageNet VID	MAP	68.6	YOLOv3
2D Object Detection	ImageNet VID	MAP	84.2	REPP + SELSA (ResNet-101)
2D Object Detection	ImageNet VID	MAP	80.1	REPP + FGFA
2D Object Detection	ImageNet VID	MAP	75.1	REPP + YOLOv3
2D Object Detection	ImageNet VID	MAP	68.6	YOLOv3
16k	ImageNet VID	MAP	84.2	REPP + SELSA (ResNet-101)
16k	ImageNet VID	MAP	80.1	REPP + FGFA
16k	ImageNet VID	MAP	75.1	REPP + YOLOv3
16k	ImageNet VID	MAP	68.6	YOLOv3

Robust and efficient post-processing for video object detection

Abstract

Results

Related Papers

Robust and efficient post-processing for video object detection

Abstract

Results

Related Papers