TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VLLMs Provide Better Context for Emotion Understanding Thr...

VLLMs Provide Better Context for Emotion Understanding Through Common Sense Reasoning

Alexandros Xenos, Niki Maria Foteinopoulou, Ioanna Ntinou, Ioannis Patras, Georgios Tzimiropoulos

2024-04-10Emotion ClassificationEmotion Recognition in ContextCommon Sense Reasoning
PaperPDFCode(official)

Abstract

Recognising emotions in context involves identifying the apparent emotions of an individual, taking into account contextual cues from the surrounding scene. Previous approaches to this task have involved the design of explicit scene-encoding architectures or the incorporation of external scene-related information, such as captions. However, these methods often utilise limited contextual information or rely on intricate training pipelines. In this work, we leverage the groundbreaking capabilities of Vision-and-Large-Language Models (VLLMs) to enhance in-context emotion classification without introducing complexity to the training process in a two-stage approach. In the first stage, we propose prompting VLLMs to generate descriptions in natural language of the subject's apparent emotion relative to the visual context. In the second stage, the descriptions are used as contextual information and, along with the image input, are used to train a transformer-based architecture that fuses text and visual features before the final classification task. Our experimental results show that the text and image features have complementary information, and our fused architecture significantly outperforms the individual modalities without any complex training methods. We evaluate our approach on three different datasets, namely, EMOTIC, CAER-S, and BoLD, and achieve state-of-the-art or comparable accuracy across all datasets and metrics compared to much more complex approaches. The code will be made publicly available on github: https://github.com/NickyFot/EmoCommonSense.git

Results

TaskDatasetMetricValueModel
Emotion RecognitionBoLDAUC69.83A. Xenos et al
Emotion RecognitionBoLDAverage mAP26.66A. Xenos et al
Emotion RecognitionEMOTICmAP38.52A. Xenos et al
Emotion RecognitionCAERAccuracy93.08A. Xenos et al

Related Papers

NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Comparing Apples to Oranges: A Dataset & Analysis of LLM Humour Understanding from Traditional Puns to Topical Jokes2025-07-17LoSiA: Efficient High-Rank Fine-Tuning via Subnet Localization and Optimization2025-07-06EditInspector: A Benchmark for Evaluation of Text-Guided Image Edits2025-06-11CheckManual: A New Challenge and Benchmark for Manual-based Appliance Manipulation2025-06-11Prime the search: Using large language models for guiding geometric task and motion planning by warm-starting tree search2025-06-08AmbiK: Dataset of Ambiguous Tasks in Kitchen Environment2025-06-04MMAFFBen: A Multilingual and Multimodal Affective Analysis Benchmark for Evaluating LLMs and VLMs2025-05-30