TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Joyful: Joint Modality Fusion and Graph Contrastive Learni...

Joyful: Joint Modality Fusion and Graph Contrastive Learning for Multimodal Emotion Recognition

Dongyuan Li, Yusong Wang, Kotaro Funakoshi, Manabu Okumura

2023-11-18Emotion Recognition in ConversationMultimodal Emotion RecognitionContrastive LearningFace SwappingEmotion Recognition
PaperPDFCode(official)

Abstract

Multimodal emotion recognition aims to recognize emotions for each utterance of multiple modalities, which has received increasing attention for its application in human-machine interaction. Current graph-based methods fail to simultaneously depict global contextual features and local diverse uni-modal features in a dialogue. Furthermore, with the number of graph layers increasing, they easily fall into over-smoothing. In this paper, we propose a method for joint modality fusion and graph contrastive learning for multimodal emotion recognition (Joyful), where multimodality fusion, contrastive learning, and emotion recognition are jointly optimized. Specifically, we first design a new multimodal fusion mechanism that can provide deep interaction and fusion between the global contextual and uni-modal specific features. Then, we introduce a graph contrastive learning framework with inter-view and intra-view contrastive losses to learn more distinguishable representations for samples with different sentiments. Extensive experiments on three benchmark datasets indicate that Joyful achieved state-of-the-art (SOTA) performance compared to all baselines.

Results

TaskDatasetMetricValueModel
Emotion RecognitionIEMOCAP-4Weighted F185.7Joyful
Emotion RecognitionIEMOCAP-4Accuracy85.6Joyful
Emotion RecognitionIEMOCAP-4Weighted F185.7Joyful
Emotion RecognitionMELDAccuracy62.53Joyful
Emotion RecognitionMELDWeighted F161.77Joyful
Emotion RecognitionIEMOCAPAccuracy71Joyful
Emotion RecognitionIEMOCAPWeighted F170.5Joyful
Multimodal Emotion RecognitionIEMOCAP-4Accuracy85.6Joyful
Multimodal Emotion RecognitionIEMOCAP-4Weighted F185.7Joyful
Multimodal Emotion RecognitionMELDAccuracy62.53Joyful
Multimodal Emotion RecognitionMELDWeighted F161.77Joyful
Multimodal Emotion RecognitionIEMOCAPAccuracy71Joyful
Multimodal Emotion RecognitionIEMOCAPWeighted F170.5Joyful

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21SemCSE: Semantic Contrastive Sentence Embeddings Using LLM-Generated Summaries For Scientific Abstracts2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17Overview of the TalentCLEF 2025: Skill and Job Title Intelligence for Human Capital Management2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17SHIELD: A Secure and Highly Enhanced Integrated Learning for Robust Deepfake Detection against Adversarial Attacks2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16