Objects do not disappear: Video object detection by single-frame object location anticipation

Xin Liu, Fatemeh Karimi Nejadasl, Jan C. van Gemert, Olaf Booij, Silvia L. Pintea

2023-08-09ICCV 2023 1Video Object Detection object-detection Object Detection

Abstract

Objects in videos are typically characterized by continuous smooth motion. We exploit continuous smooth motion in three ways. 1) Improved accuracy by using object motion as an additional source of supervision, which we obtain by anticipating object locations from a static keyframe. 2) Improved efficiency by only doing the expensive feature computations on a small subset of all frames. Because neighboring video frames are often redundant, we only compute features for a single static keyframe and predict object locations in subsequent frames. 3) Reduced annotation cost, where we only annotate the keyframe and use smooth pseudo-motion between keyframes. We demonstrate computational efficiency, annotation efficiency, and improved mean average precision compared to the state-of-the-art on four datasets: ImageNet VID, EPIC KITCHENS-55, YouTube-BoundingBoxes, and Waymo Open dataset. Our source code is available at https://github.com/L-KID/Videoobject-detection-by-location-anticipation.

Results

Task	Dataset	Metric	Value	Model
Object Detection	EPIC-KITCHENS-55	mAP@.5	41.7	Ours (Faster RCNN)
Object Detection	Waymo Open Dataset	AP	59.28
Object Detection	ImageNet VID	MAP	91.3	Ours (Def. DETR + SwinB)
Object Detection	ImageNet VID	MAP	87.9	Ours (Def. DETR + R101)
Object Detection	ImageNet VID	MAP	87.2	Ours (Faster RCNN + R101)
Object Detection	YT-BB	mAP	59.8
3D	EPIC-KITCHENS-55	mAP@.5	41.7	Ours (Faster RCNN)
3D	Waymo Open Dataset	AP	59.28
3D	ImageNet VID	MAP	91.3	Ours (Def. DETR + SwinB)
3D	ImageNet VID	MAP	87.9	Ours (Def. DETR + R101)
3D	ImageNet VID	MAP	87.2	Ours (Faster RCNN + R101)
3D	YT-BB	mAP	59.8
2D Classification	EPIC-KITCHENS-55	mAP@.5	41.7	Ours (Faster RCNN)
2D Classification	Waymo Open Dataset	AP	59.28
2D Classification	ImageNet VID	MAP	91.3	Ours (Def. DETR + SwinB)
2D Classification	ImageNet VID	MAP	87.9	Ours (Def. DETR + R101)
2D Classification	ImageNet VID	MAP	87.2	Ours (Faster RCNN + R101)
2D Classification	YT-BB	mAP	59.8
2D Object Detection	EPIC-KITCHENS-55	mAP@.5	41.7	Ours (Faster RCNN)
2D Object Detection	Waymo Open Dataset	AP	59.28
2D Object Detection	ImageNet VID	MAP	91.3	Ours (Def. DETR + SwinB)
2D Object Detection	ImageNet VID	MAP	87.9	Ours (Def. DETR + R101)
2D Object Detection	ImageNet VID	MAP	87.2	Ours (Faster RCNN + R101)
2D Object Detection	YT-BB	mAP	59.8
16k	EPIC-KITCHENS-55	mAP@.5	41.7	Ours (Faster RCNN)
16k	Waymo Open Dataset	AP	59.28
16k	ImageNet VID	MAP	91.3	Ours (Def. DETR + SwinB)
16k	ImageNet VID	MAP	87.9	Ours (Def. DETR + R101)
16k	ImageNet VID	MAP	87.2	Ours (Faster RCNN + R101)
16k	YT-BB	mAP	59.8

Objects do not disappear: Video object detection by single-frame object location anticipation

Abstract

Results

Related Papers

Objects do not disappear: Video object detection by single-frame object location anticipation

Abstract

Results

Related Papers