TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mutual Modality Learning for Video Action Classification

Mutual Modality Learning for Video Action Classification

Stepan Komkov, Maksim Dzabraev, Aleksandr Petiushko

2020-11-04Action ClassificationOptical Flow EstimationGeneral ClassificationAction RecognitionClassification
PaperPDFCode(official)

Abstract

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still a room for improvement. In this paper, we explore the various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.

Results

TaskDatasetMetricValueModel
Activity RecognitionSomething-Something V2Top-1 Accuracy69.02MML (ensemble)
Activity RecognitionSomething-Something V2Top-5 Accuracy92.7MML (ensemble)
Activity RecognitionSomething-Something V2Top-1 Accuracy66.83MML (single)
Activity RecognitionSomething-Something V2Top-5 Accuracy91.3MML (single)
Action RecognitionSomething-Something V2Top-1 Accuracy69.02MML (ensemble)
Action RecognitionSomething-Something V2Top-5 Accuracy92.7MML (ensemble)
Action RecognitionSomething-Something V2Top-1 Accuracy66.83MML (single)
Action RecognitionSomething-Something V2Top-5 Accuracy91.3MML (single)

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Learning to Track Any Points from Human Motion2025-07-08