Evaluating Variants of wav2vec 2.0 on Affective Vocal Burst Tasks

Bagus Tris Atmaja, Akira Sasou

2023-05-05ICASSP 2023 5Speech Recognition Automatic Speech Recognition Vocal Bursts Intensity Prediction speech-recognition Self-Supervised Learning Vocal Bursts Type Prediction Cultural Vocal Bursts Intensity Prediction Vocal Bursts Valence Prediction

Paper PDF Code

Abstract

The search for emotional biomarkers within the human voice is a challenging research area. Previous studies focused on predicting affective state from speech; this study explores various tasks on affective vocal bursts. Borrowing the success of self-supervised learning in automatic speech recognition, we extracted acoustic embedding using variants of wav2vec 2.0 for four affective vocal bursts tasks: High, Two, Culture, and Type. Using a similar architecture for all tasks, the evaluation of acoustic embeddings reveals the potential use of wav2vec 2.0 variants over the conventional acoustic features in affective vocal bursts tasks. We evaluated both conventional acoustic features and these acoustic embeddings on the different number of twenty seeds evaluation and reported the maximum and average scores with their standard deviations in the validation set. Three high scores from these validations for all tasks assist the generation of predictions for the test set. We compared the test scores with previous studies and obtained remarkable improvements.

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17 NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17 A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17 WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14 Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14 VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08 Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08 A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06