TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/DEAR

DEAR

AudioCreative Commons Attribution Non Commercial No Derivatives 4.0 International

Dataset Summary

The Deep Evaluation of Audio Representations (DEAR) dataset is a benchmark designed to assess general-purpose audio foundation models on properties critical for hearable devices. It comprises 1,158 mono audio tracks (30 s each), spatially mixing proprietary anechoic speech monologues with high-quality everyday acoustic scene recordings from the HOA‑SSR library. DEAR enables controlled evaluation of:

  • Context (environment type: domestic, leisure, nature, professional, transport; indoor/outdoor; stationary/transient noise)
  • Speech sources (speech presence detection; speaker count)
  • Acoustic properties (direct-to-reverberant ratio DRR, reverberation time RT60, signal‑to‑noise ratio SNR)

All tracks are down‑mixed to a single channel at 44.1 kHz (32‑bit) and split into development and test sets with no overlap in speakers, backgrounds, or impulse responses.

Tasks

| Task Group | Task | Type | Metric | | ------------- | ----------------------------------- | ----------- | ----------- | | Context | 5‑way environment classification | Multi‑class | Matthews' ϕ\phiϕ | | | Indoor vs. outdoor | Binary | Matthews' ϕ\phiϕ | | | Stationary vs. transient noise | Binary | Matthews' ϕ\phiϕ | | Sources | Speech presence (1 s segments) | Binary | Matthews' ϕ\phiϕ | | | Speaker count (1 s segments) | Regression | R2R^2R2 | | Acoustics | DRR (1 s segments, 1 speaker) | Regression | R2R^2R2 | | | RT60 (1 s segments, 1 speaker) | Regression | R2R^2R2 | | | SNR (1 s segments, 1 speaker) | Regression | R2R^2R2 | | Retrospective | TUT2017 acoustic scene (15 classes) | Multi‑class | Matthews' ϕ\phiϕ | | | LibriCount speaker count (0–10) | Regression | R2R^2R2 |

Dataset Structure

├── data/
│   ├── 00094903-4dbf-44a9-bf09-698fc361dbff.wav
│   └── …
├── development.csv
└── test.csv
  • .wav files: mono, 44.1 kHz, 32‑bit float
  • .csv files: meta-data for all tasks, linkable to wav files with id

Usage

Visit the dedicated code repository: https://github.com/DEAR-dataset/code

Source Data

  • Speech monologues (proprietary anechoic recordings)
  • HOA‑SSR library ambisonics scenes (licensed via FORCE Technology)
  • Impulse responses for controlled reverberation

Citation

If you use DEAR in your research, please cite:

@inproceedings{
  groeger2025dear,
  author={Gröger, Fabian and Baumann, Pascal and Amruthalingam, Ludovic and Simon, Laurent and Giurda, Ruksana and Lionetti, Simone},
  booktitle={ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, 
  title={Evaluation of Deep Audio Representations for Hearables}, 
  year={2025},
  doi={10.1109/ICASSP49660.2025.10887737}
}

ArXiv version: arxiv.org/abs/2502.06664

Statistics

Papers
0
Benchmarks
0

Links

Homepage

Tasks

Audio Classification