EasyCom
AudioDialogImagesRGB VideoSpeechTime seriesVideosCC BY-NC 4.0Introduced 2021-07-09
The Easy Communications (EasyCom) dataset is a world-first dataset designed to help mitigate the cocktail party effect from an augmented-reality (AR) -motivated multi-sensor egocentric world view. The dataset contains AR glasses egocentric multi-channel microphone array audio, wide field-of-view RGB video, speech source pose, headset microphone audio, annotated voice activity, speech transcriptions, head and face bounding boxes and source identification labels. We have created and are releasing this dataset to facilitate research in multi-modal AR solutions to the cocktail party problem.
Source: EasyCom
Benchmarks
Active Speaker Localization/ASL mAPImage Clustering/NMISpeech Enhancement/PESQSpeech Enhancement/STOISpeech Enhancement/ViSQOLSpeech Enhancement/HASQISpeech Enhancement/Audio Quality MOSSpeech Enhancement/SDRSpeech Enhancement/ESTOISpeech Enhancement/HASPISpeech Enhancement/SI-SDRSpeech Enhancement/SIIBSpeech Enhancement/SNRSpeech Enhancement/SegSNRSpeech Recognition/WER (%)