Real-time Action Recognition with Enhanced Motion Vector CNNs

Bowen Zhang, Li-Min Wang, Zhe Wang, Yu Qiao, Hanli Wang

2016-04-26CVPR 2016 6Optical Flow Estimation Action Recognition Temporal Action Localization

Abstract

The deep two-stream architecture exhibited excellent performance on video based action recognition. The most computationally expensive step in this approach comes from the calculation of optical flow which prevents it to be real-time. This paper accelerates this architecture by replacing optical flow with motion vector which can be obtained directly from compressed videos without extra calculation. However, motion vector lacks fine structures, and contains noisy and inaccurate motion patterns, leading to the evident degradation of recognition performance. Our key insight for relieving this problem is that optical flow and motion vector are inherent correlated. Transferring the knowledge learned with optical flow CNN to motion vector CNN can significantly boost the performance of the latter. Specifically, we introduce three strategies for this, initialization transfer, supervision transfer and their combination. Experimental results show that our method achieves comparable recognition performance to the state-of-the-art, while our method can process 390.7 frames per second, which is 27 times faster than the original two-stream method.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	UCF101	3-fold Accuracy	86.4	MV-CNN
Action Recognition	UCF101	3-fold Accuracy	86.4	MV-CNN

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16 An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11 Learning to Track Any Points from Human Motion2025-07-08 TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01 MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29