Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors

Julien Hauret, Malo Olivier, Thomas Joubaud, Christophe Langrenne, Sarah Poirée, Véronique Zimpfer, Éric Bavu

2024-07-16Speech Recognition Automatic Speech Recognition (ASR)speech-recognition Speaker Verification Bandwidth Extension Automatic Phoneme Recognition Speech Enhancement

Paper PDF Code(official)

Abstract

Vibravox is a dataset compliant with the General Data Protection Regulation (GDPR) containing audio recordings using five different body-conduction audio sensors: two in-ear microphones, two bone conduction vibration pickups, and a laryngophone. The dataset also includes audio data from an airborne microphone used as a reference. The Vibravox corpus contains 45 hours per sensor of speech samples and physiological sounds recorded by 188 participants under different acoustic conditions imposed by a high order ambisonics 3D spatializer. Annotations about the recording conditions and linguistic transcriptions are also included in the corpus. We conducted a series of experiments on various speech-related tasks, including speech recognition, speech enhancement, and speaker verification. These experiments were carried out using state-of-the-art models to evaluate and compare their performances on signals captured by the different audio sensors offered by the Vibravox dataset, with the aim of gaining a better grasp of their individual characteristics.

Results

Task	Dataset	Metric	Value	Model
Speech Recognition	VibraVox (throat microphone)	Test PER	0.073	medium wav2vec2.0
Speech Recognition	VibraVox (headset microphone)	Test PER	0.028	medium wav2vec2.0
Speech Recognition	VibraVox (forehead accelerometer)	Test PER	0.046	medium wav2vec2.0
Speech Recognition	VibraVox (soft in-ear microphone)	Test PER	0.041	medium wav2vec2.0
Speech Recognition	VibraVox (rigid in-ear microphone)	Test PER	0.045	medium wav2vec2.0
Speech Recognition	VibraVox (temple vibration pickup)	Test PER	0.142	medium wav2vec2.0
Speaker Verification	VibraVox (soft in-ear microphone)	Test EER	0.0172	ECAPA2
Speaker Verification	VibraVox (soft in-ear microphone)	Test min-DCF	0.1	ECAPA2
Speaker Verification	VibraVox (temple vibration pickup)	Test EER	0.08	ECAPA2
Speaker Verification	VibraVox (temple vibration pickup)	Test min-DCF	0.58	ECAPA2
Speaker Verification	VibraVox (rigid in-ear microphone)	Test EER	0.0316	ECAPA2
Speaker Verification	VibraVox (rigid in-ear microphone)	Test min-DCF	0.21	ECAPA2
Speaker Verification	VibraVox (forehead accelerometer)	Test EER	0.009	ECAPA2
Speaker Verification	VibraVox (forehead accelerometer)	Test min-DCF	0.06	ECAPA2
Speaker Verification	VibraVox (throat microphone)	Test EER	0.0353	ECAPA2
Speaker Verification	VibraVox (throat microphone)	Test min-DCF	0.2	ECAPA2
Speaker Verification	VibraVox (headset microphone)	Test EER	0.0026	ECAPA2
Speaker Verification	VibraVox (headset microphone)	Test min-DCF	0.02	ECAPA2
Speech Enhancement	VibraVox (forehead accelerometer)	EER (ECAPA2)	0.0183	Configurable EBEN (M=4, P=4, Q=4)
Speech Enhancement	VibraVox (forehead accelerometer)	Noresqua-MOS	4.25	Configurable EBEN (M=4, P=4, Q=4)
Speech Enhancement	VibraVox (forehead accelerometer)	PER (wav2vec2)	0.091	Configurable EBEN (M=4, P=4, Q=4)
Speech Enhancement	VibraVox (forehead accelerometer)	STOI	0.855	Configurable EBEN (M=4, P=4, Q=4)
Speech Enhancement	VibraVox (temple vibration pickup)	EER (ECAPA2)	0.1622	Configurable EBEN (M=4, P=1, Q=4)
Speech Enhancement	VibraVox (temple vibration pickup)	Noresqua-MOS	3.632	Configurable EBEN (M=4, P=1, Q=4)
Speech Enhancement	VibraVox (temple vibration pickup)	PER (wav2vec2)	0.391	Configurable EBEN (M=4, P=1, Q=4)
Speech Enhancement	VibraVox (temple vibration pickup)	STOI	0.763	Configurable EBEN (M=4, P=1, Q=4)
Speech Enhancement	VibraVox (throat microphone)	EER (ECAPA2)	0.0847	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (throat microphone)	Noresqua-MOS	3.862	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (throat microphone)	PER (wav2vec2)	0.179	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (throat microphone)	STOI	0.834	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (soft in-ear microphone)	EER (ECAPA2)	0.0488	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (soft in-ear microphone)	Noresqua-MOS	4.331	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (soft in-ear microphone)	PER (wav2vec2)	0.087	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (soft in-ear microphone)	STOI	0.868	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (rigid in-ear microphone)	EER (ECAPA2)	0.0364	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (rigid in-ear microphone)	Noresqua-MOS	4.285	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (rigid in-ear microphone)	PER (wav2vec2)	0.084	Configurable EBEN (M=4, P=2, Q=4)
Speech Enhancement	VibraVox (rigid in-ear microphone)	STOI	0.877	Configurable EBEN (M=4, P=2, Q=4)
Automatic Speech Recognition (ASR)	VibraVox (throat microphone)	Test PER	0.073	medium wav2vec2.0
Automatic Speech Recognition (ASR)	VibraVox (headset microphone)	Test PER	0.028	medium wav2vec2.0
Automatic Speech Recognition (ASR)	VibraVox (forehead accelerometer)	Test PER	0.046	medium wav2vec2.0
Automatic Speech Recognition (ASR)	VibraVox (soft in-ear microphone)	Test PER	0.041	medium wav2vec2.0
Automatic Speech Recognition (ASR)	VibraVox (rigid in-ear microphone)	Test PER	0.045	medium wav2vec2.0
Automatic Speech Recognition (ASR)	VibraVox (temple vibration pickup)	Test PER	0.142	medium wav2vec2.0

Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors

Abstract

Results

Related Papers

Vibravox: A Dataset of French Speech Captured with Body-conduction Audio Sensors

Abstract

Results

Related Papers