Developing a High-performance Framework for Speech Emotion Recognition in Naturalistic Conditions Challenge for Emotional Attribute Prediction

Thanathai Lertpetchpun, Tiantian Feng, Dani Byrd, Shrikanth Narayanan

2025-06-12Attribute Multi-Task Learning Speech Emotion Recognition Emotion Recognition Task 2

Abstract

Speech emotion recognition (SER) in naturalistic conditions presents a significant challenge for the speech processing community. Challenges include disagreement in labeling among annotators and imbalanced data distributions. This paper presents a reproducible framework that achieves superior (top 1) performance in the Emotion Recognition in Naturalistic Conditions Challenge (IS25-SER Challenge) - Task 2, evaluated on the MSP-Podcast dataset. Our system is designed to tackle the aforementioned challenges through multimodal learning, multi-task learning, and imbalanced data handling. Specifically, our best system is trained by adding text embeddings, predicting gender, and including ``Other'' (O) and ``No Agreement'' (X) samples in the training set. Our system's results secured both first and second places in the IS25-SER Challenge, and the top performance was achieved by a simple two-system ensemble.

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21 SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17 Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17 MGFFD-VLM: Multi-Granularity Prompt Learning for Face Forgery Detection with VLM2025-07-16 Non-Adaptive Adversarial Face Generation2025-07-16 Attributes Shape the Embedding Space of Face Recognition Models2025-07-15 COLIBRI Fuzzy Model: Color Linguistic-Based Representation and Interpretation2025-07-15 Robust-Multi-Task Gradient Boosting2025-07-15