TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/emoDARTS: Joint Optimisation of CNN & Sequential Neural Ne...

emoDARTS: Joint Optimisation of CNN & Sequential Neural Network Architectures for Superior Speech Emotion Recognition

Thejan Rajapakshe, Rajib Rana, Sara Khalifa, Berrak Sisman, Bjorn W. Schuller, Carlos Busso

2024-03-21Neural Architecture SearchSpeech Emotion RecognitionEmotion Recognition
PaperPDFCode(official)

Abstract

Speech Emotion Recognition (SER) is crucial for enabling computers to understand the emotions conveyed in human communication. With recent advancements in Deep Learning (DL), the performance of SER models has significantly improved. However, designing an optimal DL architecture requires specialised knowledge and experimental assessments. Fortunately, Neural Architecture Search (NAS) provides a potential solution for automatically determining the best DL model. The Differentiable Architecture Search (DARTS) is a particularly efficient method for discovering optimal models. This study presents emoDARTS, a DARTS-optimised joint CNN and Sequential Neural Network (SeqNN: LSTM, RNN) architecture that enhances SER performance. The literature supports the selection of CNN and LSTM coupling to improve performance. While DARTS has previously been used to choose CNN and LSTM operations independently, our technique adds a novel mechanism for selecting CNN and SeqNN operations in conjunction using DARTS. Unlike earlier work, we do not impose limits on the layer order of the CNN. Instead, we let DARTS choose the best layer order inside the DARTS cell. We demonstrate that emoDARTS outperforms conventionally designed CNN-LSTM models and surpasses the best-reported SER results achieved through DARTS on CNN-LSTM by evaluating our approach on the IEMOCAP, MSP-IMPROV, and MSP-Podcast datasets.

Results

TaskDatasetMetricValueModel
Emotion RecognitionIEMOCAPUA CV0.7655emoDARTS
Emotion RecognitionIEMOCAPWA CV0.7803emoDARTS
Emotion RecognitionMSP-IMPROVUA0.6563emoDARTS
Speech Emotion RecognitionIEMOCAPUA CV0.7655emoDARTS
Speech Emotion RecognitionIEMOCAPWA CV0.7803emoDARTS
Speech Emotion RecognitionMSP-IMPROVUA0.6563emoDARTS

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21DASViT: Differentiable Architecture Search for Vision Transformer2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition2025-07-15Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation2025-07-11CAST-Phys: Contactless Affective States Through Physiological signals Database2025-07-08Exploring Remote Physiological Signal Measurement under Dynamic Lighting Conditions at Night: Dataset, Experiment, and Analysis2025-07-06How to Retrieve Examples in In-context Learning to Improve Conversational Emotion Recognition using Large Language Models?2025-06-25