Mutual Modality Learning for Video Action Classification

Stepan Komkov, Maksim Dzabraev, Aleksandr Petiushko

2020-11-04Action Classification Optical Flow Estimation General Classification Action Recognition Classification

Abstract

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still a room for improvement. In this paper, we explore the various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.

Results

Task	Dataset	Metric	Value	Model
Activity Recognition	Something-Something V2	Top-1 Accuracy	69.02	MML (ensemble)
Activity Recognition	Something-Something V2	Top-5 Accuracy	92.7	MML (ensemble)
Activity Recognition	Something-Something V2	Top-1 Accuracy	66.83	MML (single)
Activity Recognition	Something-Something V2	Top-5 Accuracy	91.3	MML (single)
Action Recognition	Something-Something V2	Top-1 Accuracy	69.02	MML (ensemble)
Action Recognition	Something-Something V2	Top-5 Accuracy	92.7	MML (ensemble)
Action Recognition	Something-Something V2	Top-1 Accuracy	66.83	MML (single)
Action Recognition	Something-Something V2	Top-5 Accuracy	91.3	MML (single)

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17 A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13 An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11 Learning to Track Any Points from Human Motion2025-07-08