EARS-WHAM

SpeechCC-NC 4.0 International licenseIntroduced 2024-06-10

The EARS-WHAM dataset mixes speech from the EARS dataset with real noise recordings from the WHAM! dataset. Speech and noise files are mixed at signal-to-noise ratios (SNRs) randomly sampled in a range of [−2.5, 17.5] dB, where the SNR is computed using loudness K- weighted relative to full scale (LKFS) standardized in ITU-R BS.1770 to obtain a more perceptually meaningful scaling and also to remove silent regions from the SNR computation.

Benchmarks

Speech Enhancement/PESQ-WB Speech Enhancement/SI-SDR Speech Enhancement/ESTOI Speech Enhancement/SIGMOS Speech Enhancement/DNSMOS Speech Enhancement/POLQA