TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/STARSS22: A dataset of spatial recordings of real scenes w...

STARSS22: A dataset of spatial recordings of real scenes with spatiotemporal annotations of sound events

Archontis Politis, Kazuki Shimada, Parthasaarathy Sudarsanam, Sharath Adavanne, Daniel Krause, Yuichiro Koyama, Naoya Takahashi, Shusuke Takahashi, Yuki Mitsufuji, Tuomas Virtanen

2022-06-04Sound Event Localization and Detection
PaperPDFCodeCode(official)

Abstract

This report presents the Sony-TAu Realistic Spatial Soundscapes 2022 (STARS22) dataset for sound event localization and detection, comprised of spatial recordings of real scenes collected in various interiors of two different sites. The dataset is captured with a high resolution spherical microphone array and delivered in two 4-channel formats, first-order Ambisonics and tetrahedral microphone array. Sound events in the dataset belonging to 13 target sound classes are annotated both temporally and spatially through a combination of human annotation and optical tracking. The dataset serves as the development and evaluation dataset for the Task 3 of the DCASE2022 Challenge on Sound Event Localization and Detection and introduces significant new challenges for the task compared to the previous iterations, which were based on synthetic spatialized sound scene recordings. Dataset specifications are detailed including recording and annotation process, target classes and their presence, and details on the development and evaluation splits. Additionally, the report presents the baseline system that accompanies the dataset in the challenge with emphasis on the differences with the baseline of the previous iterations; namely, introduction of the multi-ACCDOA representation to handle multiple simultaneous occurences of events of the same class, and support for additional improved input features for the microphone array format. Results of the baseline indicate that with a suitable training strategy a reasonable detection and localization performance can be achieved on real sound scene recordings. The dataset is available in https://zenodo.org/record/6387880.

Results

TaskDatasetMetricValueModel
Sound Event Localization and DetectionSTARSS22Class-dependent localization error29.3Baseline (FOA)
Sound Event Localization and DetectionSTARSS22Class-dependent localization recall46Baseline (FOA)
Sound Event Localization and DetectionSTARSS22Localization-dependent error rate (20°)71Baseline (FOA)
Sound Event Localization and DetectionSTARSS22location-dependent F1-score (macro)21Baseline (FOA)
Sound Event Localization and DetectionSTARSS22location-dependent F1-score (micro)0.36Baseline (FOA)
Sound Event Localization and DetectionSTARSS22Class-dependent localization error32.2Baseline (MIC)
Sound Event Localization and DetectionSTARSS22Class-dependent localization recall47Baseline (MIC)
Sound Event Localization and DetectionSTARSS22location-dependent F1-score (macro)18Baseline (MIC)
Sound Event Localization and DetectionSTARSS22location-dependent F1-score (micro)0.36Baseline (MIC)

Related Papers

Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos2025-07-07Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling2025-06-16CST-former: Multidimensional Attention-based Transformer for Sound Event Localization and Detection in Real Scenes2025-04-17Reverberation-based Features for Sound Event Localization and Detection with Distance Estimation2025-04-11An Experimental Study on Joint Modeling for Sound Event Localization and Detection with Source Distance Estimation2025-01-18MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation2024-11-21Class-Incremental Learning for Sound Event Localization and Detection2024-11-19PSELDNets: Pre-trained Neural Networks on Large-scale Synthetic Datasets for Sound Event Localization and Detection2024-11-10