TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/MSAF: Multimodal Split Attention Fusion

MSAF: Multimodal Split Attention Fusion

Lang Su, Chuqing Hu, Guofa Li, Dongpu Cao

2020-12-13Sentiment AnalysisMultimodal Emotion RecognitionAction RecognitionMultimodal Sentiment AnalysisEmotion Recognition
PaperPDFCode(official)

Abstract

Multimodal learning mimics the reasoning process of the human multi-sensory system, which is used to perceive the surrounding world. While making a prediction, the human brain tends to relate crucial cues from multiple sources of information. In this work, we propose a novel multimodal fusion module that learns to emphasize more contributive features across all modalities. Specifically, the proposed Multimodal Split Attention Fusion (MSAF) module splits each modality into channel-wise equal feature blocks and creates a joint representation that is used to generate soft attention for each channel across the feature blocks. Further, the MSAF module is designed to be compatible with features of various spatial dimensions and sequence lengths, suitable for both CNNs and RNNs. Thus, MSAF can be easily added to fuse features of any unimodal networks and utilize existing pretrained unimodal model weights. To demonstrate the effectiveness of our fusion module, we design three multimodal networks with MSAF for emotion recognition, sentiment analysis, and action recognition tasks. Our approach achieves competitive results in each task and outperforms other application-specific networks and multimodal fusion benchmarks.

Results

TaskDatasetMetricValueModel
Activity RecognitionNTU RGB+DAccuracy (CS)92.24MSAF (RGB+Pose)
Action RecognitionNTU RGB+DAccuracy (CS)92.24MSAF (RGB+Pose)

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14