Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

Yuxiang Lin, Jingdong Sun, Zhi-Qi Cheng, Jue Wang, Haomin Liang, Zebang Cheng, Yifei Dong, Jun-Yan He, Xiaojiang Peng, Xian-Sheng Hua

2025-04-10Emotion Interpretation Emotion Recognition

Paper PDF Code Code(official)

Abstract

Most existing emotion analysis emphasizes which emotion arises (e.g., happy, sad, angry) but neglects the deeper why. We propose Emotion Interpretation (EI), focusing on causal factors-whether explicit (e.g., observable objects, interpersonal interactions) or implicit (e.g., cultural context, off-screen events)-that drive emotional responses. Unlike traditional emotion recognition, EI tasks require reasoning about triggers instead of mere labeling. To facilitate EI research, we present EIBench, a large-scale benchmark encompassing 1,615 basic EI samples and 50 complex EI samples featuring multifaceted emotions. Each instance demands rationale-based explanations rather than straightforward categorization. We further propose a Coarse-to-Fine Self-Ask (CFSA) annotation pipeline, which guides Vision-Language Models (VLLMs) through iterative question-answer rounds to yield high-quality labels at scale. Extensive evaluations on open-source and proprietary large language models under four experimental settings reveal consistent performance gaps-especially for more intricate scenarios-underscoring EI's potential to enrich empathetic, context-aware AI applications. Our benchmark and methods are publicly available at: https://github.com/Lum1104/EIBench, offering a foundation for advanced multimodal causal analysis and next-generation affective computing.

Results

Task	Dataset	Metric	Value	Model
Emotion Interpretation	EIBench (complex)	Recall	39.27	ChatGPT-4o
Emotion Interpretation	EIBench (complex)	Recall	39.16	LLaVA-NEXT (13B)
Emotion Interpretation	EIBench (complex)	Recall	38.71	LLaVA-NEXT (7B)
Emotion Interpretation	EIBench (complex)	Recall	38.1	LLaVA-1.5 (13B)
Emotion Interpretation	EIBench (complex)	Recall	35.37	LLaVA-NEXT (34B)
Emotion Interpretation	EIBench (complex)	Recall	35.1	MiniGPT-v2
Emotion Interpretation	EIBench (complex)	Recall	30.9	Video-LLaVA
Emotion Interpretation	EIBench (complex)	Recall	28	ChatGPT-4V
Emotion Interpretation	EIBench (complex)	Recall	27.9	Otter
Emotion Interpretation	EIBench (complex)	Recall	24	Claude-3-haiku
Emotion Interpretation	EIBench (complex)	Recall	22	Qwen-VL-Chat
Emotion Interpretation	EIBench (complex)	Recall	21.37	Claude-3-sonnet
Emotion Interpretation	EIBench (complex)	Recall	20.37	Qwen-vl-plus
Emotion Interpretation	EIBench	Recall	63.24	Claude-3-haiku
Emotion Interpretation	EIBench	Recall	54.37	LLaVA-1.5 (13B)
Emotion Interpretation	EIBench	Recall	54.33	LLaVA-NEXT (13B)
Emotion Interpretation	EIBench	Recall	54.1	Claude-3-sonnet
Emotion Interpretation	EIBench	Recall	53.82	LLaVA-NEXT (7B)
Emotion Interpretation	EIBench	Recall	52.89	MiniGPT-v2
Emotion Interpretation	EIBench	Recall	49.99	ChatGPT-4o
Emotion Interpretation	EIBench	Recall	49.26	Video-LLaVA
Emotion Interpretation	EIBench	Recall	49.03	LLaVA-NEXT (34B)
Emotion Interpretation	EIBench	Recall	46.86	ChatGPT-4V
Emotion Interpretation	EIBench	Recall	42.81	Otter
Emotion Interpretation	EIBench	Recall	31	Qwen-vl-plus
Emotion Interpretation	EIBench	Recall	26.45	Qwen-VL-Chat

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

Abstract

Results

Related Papers

Why We Feel: Breaking Boundaries in Emotional Reasoning with Multimodal Large Language Models

Abstract

Results

Related Papers