TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BERSting at the Screams: A Benchmark for Distanced, Emotio...

BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition

Paige Tuttösí, Mantaj Dhillon, Luna Sang, Shane Eastwood, Poorvi Bhatia, Quang Minh Dinh, Avni Kapoor, Yewon Jin, Angelica Lim

2025-04-30Speech RecognitionAutomatic Speech RecognitionAutomatic Speech Recognition (ASR)speech-recognitionSpeech Emotion RecognitionEmotion Recognition
PaperPDFCode(official)

Abstract

Some speech recognition tasks, such as automatic speech recognition (ASR), are approaching or have reached human performance in many reported metrics. Yet, they continue to struggle in complex, real-world, situations, such as with distanced speech. Previous challenges have released datasets to address the issue of distanced ASR, however, the focus remains primarily on distance, specifically relying on multi-microphone array systems. Here we present the B(asic) E(motion) R(andom phrase) S(hou)t(s) (BERSt) dataset. The dataset contains almost 4 hours of English speech from 98 actors with varying regional and non-native accents. The data was collected on smartphones in the actors homes and therefore includes at least 98 different acoustic environments. The data also includes 7 different emotion prompts and both shouted and spoken utterances. The smartphones were places in 19 different positions, including obstructions and being in a different room than the actor. This data is publicly available for use and can be used to evaluate a variety of speech recognition tasks, including: ASR, shout detection, and speech emotion recognition (SER). We provide initial benchmarks for ASR and SER tasks, and find that ASR degrades both with an increase in distance and shout level and shows varied performance depending on the intended emotion. Our results show that the BERSt dataset is challenging for both ASR and SER tasks and continued work is needed to improve the robustness of such systems for more accurate real-world use.

Results

TaskDatasetMetricValueModel
Emotion RecognitionBERStUnweighted Accuracy (UA)32.1DAWN-hidden-SVM
Emotion RecognitionBERStWeighted Accuracy (WA)32.2DAWN-hidden-SVM
Emotion RecognitionBERStUnweighted Accuracy (UA)23.3Wav2Small-VAD-SVM
Emotion RecognitionBERStWeighted Accuracy (WA)22.3Wav2Small-VAD-SVM
Emotion RecognitionBERStUnweighted Accuracy (UA)20.7Speechbrain Wav2Vec2
Emotion RecognitionBERStWeighted Accuracy (WA)20.8Speechbrain Wav2Vec2
Speech Emotion RecognitionBERStUnweighted Accuracy (UA)32.1DAWN-hidden-SVM
Speech Emotion RecognitionBERStWeighted Accuracy (WA)32.2DAWN-hidden-SVM
Speech Emotion RecognitionBERStUnweighted Accuracy (UA)23.3Wav2Small-VAD-SVM
Speech Emotion RecognitionBERStWeighted Accuracy (WA)22.3Wav2Small-VAD-SVM
Speech Emotion RecognitionBERStUnweighted Accuracy (UA)20.7Speechbrain Wav2Vec2
Speech Emotion RecognitionBERStWeighted Accuracy (WA)20.8Speechbrain Wav2Vec2

Related Papers

Long-Short Distance Graph Neural Networks and Improved Curriculum Learning for Emotion Recognition in Conversation2025-07-21Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17Camera-based implicit mind reading by capturing higher-order semantic dynamics of human gaze within environmental context2025-07-17A Robust Incomplete Multimodal Low-Rank Adaptation Approach for Emotion Recognition2025-07-15WhisperKit: On-device Real-time ASR with Billion-Scale Transformers2025-07-14Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation2025-07-11VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08