EARS-Reverb

SpeechCC-NC 4.0 International licenseIntroduced 2024-06-10

The EARS-Reverb dataset uses real recorded room impulse responses (RIRs) from multiple public datasets (ACE-Challenge, AIR, ARNI, BRUDEX, dEchorate, DetmoldSRIR, and Palimpsest). All RIRs are fullband, and a randomly selected channel for multi-channel recordings is used. The reverberant speech is generated by convolving the clean speech with the RIR. To avoid a time delay between the reverberant and clean speech signal caused by the direct path of the RIR, the beginning of the RIR is cut off up to the index with the highest amplitude. Only RIRs with an RT60 reverberation time that does not exceed 2 s are used. Finally, the loudness of the reverberant speech is normalized to the loudness of the clean speech using the loudness K-weighted relative to full scale (LKFS).