VTP with more data

Reported on 4 benchmarks across 2 tasks · 1 paper · 4 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio2 results

Speech RecognitiononLRS3-TED
Word Error Rate (WER)· uses extra data· 2021-10-14
30.7
best: 0.68 (Whisper)
SOTA
Sub-word Level Lip Reading With Visual Attention arXiv:2110.07603
Speech RecognitiononLRS2
Word Error Rate (WER)· uses extra data· 2021-10-14
22.6
best: 2.1 (RAVEn Large)
SOTA
Sub-word Level Lip Reading With Visual Attention arXiv:2110.07603

Speech2 results

Visual Speech RecognitiononLRS3-TED
Word Error Rate (WER)· uses extra data· 2021-10-14
30.7
best: 19.1 (CTC/Attention)
SOTA
Sub-word Level Lip Reading With Visual Attention arXiv:2110.07603
Visual Speech RecognitiononLRS2
Word Error Rate (WER)· uses extra data· 2021-10-14
22.6
SOTA
Sub-word Level Lip Reading With Visual Attention arXiv:2110.07603