Two-Stream Video Classification with Cross-Modality Attention

Lu Chi, Guiyu Tian, Yadong Mu, Qi Tian

2019-08-01Action Classification Video Classification General Classification Action Recognition Classification Vocal Bursts Valence Prediction

Paper PDF

Abstract

Fusing multi-modality information is known to be able to effectively bring significant improvement in video classification. However, the most popular method up to now is still simply fusing each stream's prediction scores at the last stage. A valid question is whether there exists a more effective method to fuse information cross modality. With the development of attention mechanism in natural language processing, there emerge many successful applications of attention in the field of computer vision. In this paper, we propose a cross-modality attention operation, which can obtain information from other modality in a more effective way than two-stream. Correspondingly we implement a compatible block named CMA block, which is a wrapper of our proposed attention operation. CMA can be plugged into many existing architectures. In the experiments, we comprehensively compare our method with two-stream and non-local models widely used in video classification. All experiments clearly demonstrate strong performance superiority by our proposed method. We also analyze the advantages of the CMA block by visualizing the attention map, which intuitively shows how the block helps the final prediction.

Results

Task	Dataset	Metric	Value	Model
Video	Kinetics-400	Acc@1	75.98	CMA iter1 (16 frames)
Activity Recognition	UCF101	3-fold Accuracy	96.5	CMA iter1-S
Action Recognition	UCF101	3-fold Accuracy	96.5	CMA iter1-S

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17 Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17 Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16 Safeguarding Federated Learning-based Road Condition Classification2025-07-16 AI-Enhanced Pediatric Pneumonia Detection: A CNN-Based Approach Using Data Augmentation and Generative Adversarial Networks (GANs)2025-07-13 Fuzzy Classification Aggregation for a Continuum of Agents2025-07-06 Hybrid-View Attention for csPCa Classification in TRUS2025-07-04 Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01