TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/DMC-Net: Generating Discriminative Motion Cues for Fast Co...

DMC-Net: Generating Discriminative Motion Cues for Fast Compressed Video Action Recognition

Zheng Shou, Xudong Lin, Yannis Kalantidis, Laura Sevilla-Lara, Marcus Rohrbach, Shih-Fu Chang, Zhicheng Yan

2019-01-11CVPR 2019 6Action ClassificationOptical Flow EstimationVideo UnderstandingAction RecognitionAction Recognition In VideosTemporal Action Localization
PaperPDF

Abstract

Motion has shown to be useful for video understanding, where motion is typically represented by optical flow. However, computing flow from video frames is very time-consuming. Recent works directly leverage the motion vectors and residuals readily available in the compressed video to represent motion at no cost. While this avoids flow computation, it also hurts accuracy since the motion vector is noisy and has substantially reduced resolution, which makes it a less discriminative motion representation. To remedy these issues, we propose a lightweight generator network, which reduces noises in motion vectors and captures fine motion details, achieving a more Discriminative Motion Cue (DMC) representation. Since optical flow is a more accurate motion representation, we train the DMC generator to approximate flow using a reconstruction loss and a generative adversarial loss, jointly with the downstream action classification task. Extensive evaluations on three action recognition benchmarks (HMDB-51, UCF-101, and a subset of Kinetics) confirm the effectiveness of our method. Our full system, consisting of the generator and the classifier, is coined as DMC-Net which obtains high accuracy close to that of using flow and runs two orders of magnitude faster than using optical flow at inference time.

Results

TaskDatasetMetricValueModel
Activity RecognitionHMDB-51Average accuracy of 3 splits77.8I3D RGB + DMC-Net (I3D)
Activity RecognitionHMDB-51Average accuracy of 3 splits71.8DMC-Net (I3D)
Activity RecognitionHMDB-51Average accuracy of 3 splits62.8DMC-Net (ResNet-18)
Activity RecognitionUCF-1013-fold Accuracy90.9DMC-Net (ResNet-18)
Activity RecognitionUCF1013-fold Accuracy96.5I3D RGB + DMC-Net (I3D)
Activity RecognitionUCF1013-fold Accuracy92.3DMC-Net (I3D)
Action RecognitionHMDB-51Average accuracy of 3 splits77.8I3D RGB + DMC-Net (I3D)
Action RecognitionHMDB-51Average accuracy of 3 splits71.8DMC-Net (I3D)
Action RecognitionHMDB-51Average accuracy of 3 splits62.8DMC-Net (ResNet-18)
Action RecognitionUCF-1013-fold Accuracy90.9DMC-Net (ResNet-18)
Action RecognitionUCF1013-fold Accuracy96.5I3D RGB + DMC-Net (I3D)
Action RecognitionUCF1013-fold Accuracy92.3DMC-Net (I3D)

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11