TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/SALSA: Spatial Cue-Augmented Log-Spectrogram Features for ...

SALSA: Spatial Cue-Augmented Log-Spectrogram Features for Polyphonic Sound Event Localization and Detection

Thi Ngoc Tho Nguyen, Karn N. Watcharasupat, Ngoc Khanh Nguyen, Douglas L. Jones, Woon-Seng Gan

2021-10-01Sound Event DetectionDirection of Arrival EstimationSound Event Localization and DetectionEvent Detection
PaperPDFCode(official)

Abstract

Sound event localization and detection (SELD) consists of two subtasks, which are sound event detection and direction-of-arrival estimation. While sound event detection mainly relies on time-frequency patterns to distinguish different sound classes, direction-of-arrival estimation uses amplitude and/or phase differences between microphones to estimate source directions. As a result, it is often difficult to jointly optimize these two subtasks. We propose a novel feature called Spatial cue-Augmented Log-SpectrogrAm (SALSA) with exact time-frequency mapping between the signal power and the source directional cues, which is crucial for resolving overlapping sound sources. The SALSA feature consists of multichannel log-spectrograms stacked along with the normalized principal eigenvector of the spatial covariance matrix at each corresponding time-frequency bin. Depending on the microphone array format, the principal eigenvector can be normalized differently to extract amplitude and/or phase differences between the microphones. As a result, SALSA features are applicable for different microphone array formats such as first-order ambisonics (FOA) and multichannel microphone array (MIC). Experimental results on the TAU-NIGENS Spatial Sound Events 2021 dataset with directional interferences showed that SALSA features outperformed other state-of-the-art features. Specifically, the use of SALSA features in the FOA format increased the F1 score and localization recall by 6% each, compared to the multichannel log-mel spectrograms with intensity vectors. For the MIC format, using SALSA features increased F1 score and localization recall by 16% and 7%, respectively, compared to using multichannel log-mel spectrograms with generalized cross-correlation spectra.

Results

TaskDatasetMetricValueModel
Sound Event Localization and DetectionTAU-NIGENS Spatial Sound Events 2021ER≤20°0.376SALSA-FOA
Sound Event Localization and DetectionTAU-NIGENS Spatial Sound Events 2021F1≤20°0.744SALSA-FOA
Sound Event Localization and DetectionTAU-NIGENS Spatial Sound Events 2021LE-CD11.1SALSA-FOA
Sound Event Localization and DetectionTAU-NIGENS Spatial Sound Events 2021LR-CD0.722SALSA-FOA

Related Papers

Spatial and Semantic Embedding Integration for Stereo Sound Event Localization and Detection in Regular Videos2025-07-07Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling2025-06-16Frequency Dynamic Convolutions for Sound Event Detection2025-06-15Teaching Physical Awareness to LLMs through Sounds2025-06-10DiCoRe: Enhancing Zero-shot Event Detection via Divergent-Convergent LLM Reasoning2025-06-05Towards real-time assessment of infrasound event detection capability using deep learning-based transmission loss estimation2025-06-03DIAMOND: An LLM-Driven Agent for Context-Aware Baseball Highlight Summarization2025-06-03Hybrid Disagreement-Diversity Active Learning for Bioacoustic Sound Event Detection2025-05-27