Dance with Flow: Two-in-One Stream Action Detection

Jiaojiao Zhao, Cees G. M. Snoek

2019-04-01CVPR 2019 6Action Detection Optical Flow Estimation Vocal Bursts Valence Prediction

Abstract

The goal of this paper is to detect the spatio-temporal extent of an action. The two-stream detection network based on RGB and flow provides state-of-the-art accuracy at the expense of a large model-size and heavy computation. We propose to embed RGB and optical-flow into a single two-in-one stream network with new layers. A motion condition layer extracts motion information from flow images, which is leveraged by the motion modulation layer to generate transformation parameters for modulating the low-level RGB features. The method is easily embedded in existing appearance- or two-stream action detection networks, and trained end-to-end. Experiments demonstrate that leveraging the motion condition to modulate RGB features improves detection accuracy. With only half the computation and parameters of the state-of-the-art two-stream methods, our two-in-one stream still achieves impressive results on UCF101-24, UCFSports and J-HMDB.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	UCF101	3-fold Accuracy	92	two-in-one two stream
Action Detection	UCF101-24	Video-mAP 0.2	78.48	Two-in-one Two Stream
Action Detection	UCF101-24	Video-mAP 0.5	50.3	Two-in-one Two Stream
Action Detection	UCF101-24	Video-mAP 0.2	75.48	Two-in-one
Action Detection	UCF101-24	Video-mAP 0.5	48.31	Two-in-one
Action Detection	UCF Sports	Video-mAP 0.5	96.52	Two-in-one Two Stream
Action Detection	UCF Sports	Video-mAP 0.5	92.74	Two-in-one
Action Detection	J-HMDB	Video-mAP 0.5	74.74	Two-in-one Two Stream
Action Detection	J-HMDB	Video-mAP 0.5	57.96	Two-in-one
Action Recognition	UCF101	3-fold Accuracy	92	two-in-one two stream

Dance with Flow: Two-in-One Stream Action Detection

Abstract

Results

Related Papers

Dance with Flow: Two-in-One Stream Action Detection

Abstract

Results

Related Papers