The Deep Evaluation of Audio Representations (DEAR) dataset is a benchmark designed to assess general-purpose audio foundation models on properties critical for hearable devices. It comprises 1,158 mono audio tracks (30 s each), spatially mixing proprietary anechoic speech monologues with high-quality everyday acoustic scene recordings from the HOA‑SSR library. DEAR enables controlled evaluation of:
All tracks are down‑mixed to a single channel at 44.1 kHz (32‑bit) and split into development and test sets with no overlap in speakers, backgrounds, or impulse responses.
| Task Group | Task | Type | Metric | | ------------- | ----------------------------------- | ----------- | ----------- | | Context | 5‑way environment classification | Multi‑class | Matthews' | | | Indoor vs. outdoor | Binary | Matthews' | | | Stationary vs. transient noise | Binary | Matthews' | | Sources | Speech presence (1 s segments) | Binary | Matthews' | | | Speaker count (1 s segments) | Regression | | | Acoustics | DRR (1 s segments, 1 speaker) | Regression | | | | RT60 (1 s segments, 1 speaker) | Regression | | | | SNR (1 s segments, 1 speaker) | Regression | | | Retrospective | TUT2017 acoustic scene (15 classes) | Multi‑class | Matthews' | | | LibriCount speaker count (0–10) | Regression | |
├── data/
│ ├── 00094903-4dbf-44a9-bf09-698fc361dbff.wav
│ └── …
├── development.csv
└── test.csv
idVisit the dedicated code repository: https://github.com/DEAR-dataset/code
If you use DEAR in your research, please cite:
@inproceedings{
groeger2025dear,
author={Gröger, Fabian and Baumann, Pascal and Amruthalingam, Ludovic and Simon, Laurent and Giurda, Ruksana and Lionetti, Simone},
booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
title={Evaluation of Deep Audio Representations for Hearables},
year={2025},
doi={10.1109/ICASSP49660.2025.10887737}
}
ArXiv version: arxiv.org/abs/2502.06664