TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/emotion2vec: Self-Supervised Pre-Training for Speech Emoti...

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

2023-12-23Sentiment AnalysisSelf-Supervised LearningSpeech Emotion RecognitionEmotion Recognition
PaperPDFCodeCode(official)

Abstract

We propose emotion2vec, a universal speech emotion representation model. emotion2vec is pre-trained on open-source unlabeled emotion data through self-supervised online distillation, combining utterance-level loss and frame-level loss during pre-training. emotion2vec outperforms state-of-the-art pre-trained universal models and emotion specialist models by only training linear layers for the speech emotion recognition task on the mainstream IEMOCAP dataset. In addition, emotion2vec shows consistent improvements among 10 different languages of speech emotion recognition datasets. emotion2vec also shows excellent results on other emotion tasks, such as song emotion recognition, emotion prediction in conversation, and sentiment analysis. Comparison experiments, ablation experiments, and visualization comprehensively demonstrate the universal capability of the proposed emotion2vec. To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Results

TaskDatasetMetricValueModel
Emotion RecognitionRESDUnweighted Accuracy (UA)79.8emotion2vec+base
Emotion RecognitionRESDWeighted Accuracy (WA)79.4emotion2vec+base
Emotion RecognitionRESDWeighted F179.4emotion2vec+base
Emotion RecognitionRESDUnweighted Accuracy (UA)69.1emotion2vec+large
Emotion RecognitionRESDWeighted Accuracy (WA)69.5emotion2vec+large
Emotion RecognitionRESDWeighted F168.8emotion2vec+large
Emotion RecognitionRESDUnweighted Accuracy (UA)65.04emotion2vec
Emotion RecognitionRESDWeighted Accuracy (WA)64.75emotion2vec
Emotion RecognitionRESDWeighted F164.53emotion2vec
Speech Emotion RecognitionRESDUnweighted Accuracy (UA)79.8emotion2vec+base
Speech Emotion RecognitionRESDWeighted Accuracy (WA)79.4emotion2vec+base
Speech Emotion RecognitionRESDWeighted F179.4emotion2vec+base
Speech Emotion RecognitionRESDUnweighted Accuracy (UA)69.1emotion2vec+large
Speech Emotion RecognitionRESDWeighted Accuracy (WA)69.5emotion2vec+large
Speech Emotion RecognitionRESDWeighted F168.8emotion2vec+large
Speech Emotion RecognitionRESDUnweighted Accuracy (UA)65.04emotion2vec
Speech Emotion RecognitionRESDWeighted Accuracy (WA)64.75emotion2vec
Speech Emotion RecognitionRESDWeighted F164.53emotion2vec

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21AdaptiSent: Context-Aware Adaptive Attention for Multimodal Aspect-Based Sentiment Analysis2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17AI Wizards at CheckThat! 2025: Enhancing Transformer-Based Embeddings with Sentiment for Subjectivity Detection in News Articles2025-07-15DCR: Quantifying Data Contamination in LLMs Evaluation2025-07-15A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition2025-07-15SentiDrop: A Multi Modal Machine Learning model for Predicting Dropout in Distance Learning2025-07-14