TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Revisiting Multi-modal Emotion Learning with Broad State S...

Revisiting Multi-modal Emotion Learning with Broad State Space Models and Probability-guidance Fusion

Yuntao Shou, Tao Meng, FuChen Zhang, Nan Yin, Keqin Li

2024-04-27Emotion Recognition in ConversationEmotion ClassificationDisentanglementRecommendation SystemsEmotion Recognition
PaperPDF

Abstract

Multi-modal Emotion Recognition in Conversation (MERC) has received considerable attention in various fields, e.g., human-computer interaction and recommendation systems. Most existing works perform feature disentanglement and fusion to extract emotional contextual information from multi-modal features and emotion classification. After revisiting the characteristic of MERC, we argue that long-range contextual semantic information should be extracted in the feature disentanglement stage and the inter-modal semantic information consistency should be maximized in the feature fusion stage. Inspired by recent State Space Models (SSMs), Mamba can efficiently model long-distance dependencies. Therefore, in this work, we fully consider the above insights to further improve the performance of MERC. Specifically, on the one hand, in the feature disentanglement stage, we propose a Broad Mamba, which does not rely on a self-attention mechanism for sequence modeling, but uses state space models to compress emotional representation, and utilizes broad learning systems to explore the potential data distribution in broad space. Different from previous SSMs, we design a bidirectional SSM convolution to extract global context information. On the other hand, we design a multi-modal fusion strategy based on probability guidance to maximize the consistency of information between modalities. Experimental results show that the proposed method can overcome the computational and memory limitations of Transformer when modeling long-distance contexts, and has great potential to become a next-generation general architecture in MERC.

Results

TaskDatasetMetricValueModel
Emotion RecognitionMELDAccuracy68Mamba-like Model
Emotion RecognitionMELDWeighted-F167.6Mamba-like Model

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21CSD-VAR: Content-Style Decomposition in Visual Autoregressive Models2025-07-18IP2: Entity-Guided Interest Probing for Personalized News Recommendation2025-07-18A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16