Julien Hauret, Malo Olivier, Thomas Joubaud, Christophe Langrenne, Sarah Poirée, Véronique Zimpfer, Éric Bavu
Vibravox is a dataset compliant with the General Data Protection Regulation (GDPR) containing audio recordings using five different body-conduction audio sensors: two in-ear microphones, two bone conduction vibration pickups, and a laryngophone. The dataset also includes audio data from an airborne microphone used as a reference. The Vibravox corpus contains 45 hours per sensor of speech samples and physiological sounds recorded by 188 participants under different acoustic conditions imposed by a high order ambisonics 3D spatializer. Annotations about the recording conditions and linguistic transcriptions are also included in the corpus. We conducted a series of experiments on various speech-related tasks, including speech recognition, speech enhancement, and speaker verification. These experiments were carried out using state-of-the-art models to evaluate and compare their performances on signals captured by the different audio sensors offered by the Vibravox dataset, with the aim of gaining a better grasp of their individual characteristics.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Speech Recognition | VibraVox (throat microphone) | Test PER | 0.073 | medium wav2vec2.0 |
| Speech Recognition | VibraVox (headset microphone) | Test PER | 0.028 | medium wav2vec2.0 |
| Speech Recognition | VibraVox (forehead accelerometer) | Test PER | 0.046 | medium wav2vec2.0 |
| Speech Recognition | VibraVox (soft in-ear microphone) | Test PER | 0.041 | medium wav2vec2.0 |
| Speech Recognition | VibraVox (rigid in-ear microphone) | Test PER | 0.045 | medium wav2vec2.0 |
| Speech Recognition | VibraVox (temple vibration pickup) | Test PER | 0.142 | medium wav2vec2.0 |
| Speaker Verification | VibraVox (soft in-ear microphone) | Test EER | 0.0172 | ECAPA2 |
| Speaker Verification | VibraVox (soft in-ear microphone) | Test min-DCF | 0.1 | ECAPA2 |
| Speaker Verification | VibraVox (temple vibration pickup) | Test EER | 0.08 | ECAPA2 |
| Speaker Verification | VibraVox (temple vibration pickup) | Test min-DCF | 0.58 | ECAPA2 |
| Speaker Verification | VibraVox (rigid in-ear microphone) | Test EER | 0.0316 | ECAPA2 |
| Speaker Verification | VibraVox (rigid in-ear microphone) | Test min-DCF | 0.21 | ECAPA2 |
| Speaker Verification | VibraVox (forehead accelerometer) | Test EER | 0.009 | ECAPA2 |
| Speaker Verification | VibraVox (forehead accelerometer) | Test min-DCF | 0.06 | ECAPA2 |
| Speaker Verification | VibraVox (throat microphone) | Test EER | 0.0353 | ECAPA2 |
| Speaker Verification | VibraVox (throat microphone) | Test min-DCF | 0.2 | ECAPA2 |
| Speaker Verification | VibraVox (headset microphone) | Test EER | 0.0026 | ECAPA2 |
| Speaker Verification | VibraVox (headset microphone) | Test min-DCF | 0.02 | ECAPA2 |
| Speech Enhancement | VibraVox (forehead accelerometer) | EER (ECAPA2) | 0.0183 | Configurable EBEN (M=4, P=4, Q=4) |
| Speech Enhancement | VibraVox (forehead accelerometer) | Noresqua-MOS | 4.25 | Configurable EBEN (M=4, P=4, Q=4) |
| Speech Enhancement | VibraVox (forehead accelerometer) | PER (wav2vec2) | 0.091 | Configurable EBEN (M=4, P=4, Q=4) |
| Speech Enhancement | VibraVox (forehead accelerometer) | STOI | 0.855 | Configurable EBEN (M=4, P=4, Q=4) |
| Speech Enhancement | VibraVox (temple vibration pickup) | EER (ECAPA2) | 0.1622 | Configurable EBEN (M=4, P=1, Q=4) |
| Speech Enhancement | VibraVox (temple vibration pickup) | Noresqua-MOS | 3.632 | Configurable EBEN (M=4, P=1, Q=4) |
| Speech Enhancement | VibraVox (temple vibration pickup) | PER (wav2vec2) | 0.391 | Configurable EBEN (M=4, P=1, Q=4) |
| Speech Enhancement | VibraVox (temple vibration pickup) | STOI | 0.763 | Configurable EBEN (M=4, P=1, Q=4) |
| Speech Enhancement | VibraVox (throat microphone) | EER (ECAPA2) | 0.0847 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (throat microphone) | Noresqua-MOS | 3.862 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (throat microphone) | PER (wav2vec2) | 0.179 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (throat microphone) | STOI | 0.834 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (soft in-ear microphone) | EER (ECAPA2) | 0.0488 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (soft in-ear microphone) | Noresqua-MOS | 4.331 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (soft in-ear microphone) | PER (wav2vec2) | 0.087 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (soft in-ear microphone) | STOI | 0.868 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (rigid in-ear microphone) | EER (ECAPA2) | 0.0364 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (rigid in-ear microphone) | Noresqua-MOS | 4.285 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (rigid in-ear microphone) | PER (wav2vec2) | 0.084 | Configurable EBEN (M=4, P=2, Q=4) |
| Speech Enhancement | VibraVox (rigid in-ear microphone) | STOI | 0.877 | Configurable EBEN (M=4, P=2, Q=4) |
| Automatic Speech Recognition (ASR) | VibraVox (throat microphone) | Test PER | 0.073 | medium wav2vec2.0 |
| Automatic Speech Recognition (ASR) | VibraVox (headset microphone) | Test PER | 0.028 | medium wav2vec2.0 |
| Automatic Speech Recognition (ASR) | VibraVox (forehead accelerometer) | Test PER | 0.046 | medium wav2vec2.0 |
| Automatic Speech Recognition (ASR) | VibraVox (soft in-ear microphone) | Test PER | 0.041 | medium wav2vec2.0 |
| Automatic Speech Recognition (ASR) | VibraVox (rigid in-ear microphone) | Test PER | 0.045 | medium wav2vec2.0 |
| Automatic Speech Recognition (ASR) | VibraVox (temple vibration pickup) | Test PER | 0.142 | medium wav2vec2.0 |