TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Evaluating Variants of wav2vec 2.0 on Affective Vocal Burs...

Evaluating Variants of wav2vec 2.0 on Affective Vocal Burst Tasks

Bagus Tris Atmaja, Akira Sasou

2023-05-05ICASSP 2023 5Speech RecognitionAutomatic Speech RecognitionVocal Bursts Intensity Predictionspeech-recognitionSelf-Supervised LearningVocal Bursts Type PredictionCultural Vocal Bursts Intensity PredictionVocal Bursts Valence Prediction
PaperPDFCode

Abstract

The search for emotional biomarkers within the human voice is a challenging research area. Previous studies focused on predicting affective state from speech; this study explores various tasks on affective vocal bursts. Borrowing the success of self-supervised learning in automatic speech recognition, we extracted acoustic embedding using variants of wav2vec 2.0 for four affective vocal bursts tasks: High, Two, Culture, and Type. Using a similar architecture for all tasks, the evaluation of acoustic embeddings reveals the potential use of wav2vec 2.0 variants over the conventional acoustic features in affective vocal bursts tasks. We evaluated both conventional acoustic features and these acoustic embeddings on the different number of twenty seeds evaluation and reported the maximum and average scores with their standard deviations in the validation set. Three high scores from these validations for all tasks assist the generation of predictions for the test set. We compared the test scores with previous studies and obtained remarkable improvements.

Related Papers

Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17A Semi-Supervised Learning Method for the Identification of Bad Exposures in Large Imaging Surveys2025-07-17WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14Self-supervised Learning on Camera Trap Footage Yields a Strong Universal Face Embedder2025-07-14VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08Speech Quality Assessment Model Based on Mixture of Experts: System-Level Performance Enhancement and Utterance-Level Challenge Analysis2025-07-08A Hybrid Machine Learning Framework for Optimizing Crop Selection via Agronomic and Economic Forecasting2025-07-06