Efficient Long-distance Latent Relation-aware Graph Neural Network for Multi-modal Emotion Recognition in Conversations

Yuntao Shou, Wei Ai, JiaYi Du, Tao Meng, Haiyan Liu, Nan Yin

2024-06-27Emotion Recognition in Conversation Emotion Recognition

Abstract

The task of multi-modal emotion recognition in conversation (MERC) aims to analyze the genuine emotional state of each utterance based on the multi-modal information in the conversation, which is crucial for conversation understanding. Existing methods focus on using graph neural networks (GNN) to model conversational relationships and capture contextual latent semantic relationships. However, due to the complexity of GNN, existing methods cannot efficiently capture the potential dependencies between long-distance utterances, which limits the performance of MERC. In this paper, we propose an Efficient Long-distance Latent Relation-aware Graph Neural Network (ELR-GNN) for multi-modal emotion recognition in conversations. Specifically, we first use pre-extracted text, video and audio features as input to Bi-LSTM to capture contextual semantic information and obtain low-level utterance features. Then, we use low-level utterance features to construct a conversational emotion interaction graph. To efficiently capture the potential dependencies between long-distance utterances, we use the dilated generalized forward push algorithm to precompute the emotional propagation between global utterances and design an emotional relation-aware operator to capture the potential semantic associations between different utterances. Furthermore, we combine early fusion and adaptive late fusion mechanisms to fuse latent dependency information between speaker relationship information and context. Finally, we obtain high-level discourse features and feed them into MLP for emotion prediction. Extensive experimental results show that ELR-GNN achieves state-of-the-art performance on the benchmark datasets IEMOCAP and MELD, with running times reduced by 52\% and 35\%, respectively.

Results

Task	Dataset	Metric	Value	Model
Emotion Recognition	MELD	Accuracy	68.7	ELR-GNN
Emotion Recognition	MELD	Weighted-F1	69.9	ELR-GNN
Emotion Recognition	IEMOCAP	Accuracy	70.6	ELR-GNN
Emotion Recognition	IEMOCAP	Weighted-F1	70.9	ELR-GNN

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21 Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17 A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition2025-07-15 Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation2025-07-11 CAST-Phys: Contactless Affective States Through Physiological signals Database2025-07-08 Exploring Remote Physiological Signal Measurement under Dynamic Lighting Conditions at Night: Dataset, Experiment, and Analysis2025-07-06 How to Retrieve Examples in In-context Learning to Improve Conversational Emotion Recognition using Large Language Models?2025-06-25 MATER: Multi-level Acoustic and Textual Emotion Representation for Interpretable Speech Emotion Recognition2025-06-24