ES³ Large

Reported on 4 benchmarks across 2 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision2 results

LipreadingonLRS2
Word Error Rate (WER)· uses extra data
26.7
best: 14.6 (Auto-AVSR)
LipreadingonLRS3-TED
Word Error Rate (WER)
37.1
best: 12.8 (LP + Conformer)

Natural Language Processing2 results

Natural Language TransductiononLRS2
Word Error Rate (WER)· uses extra data
26.7
best: 14.6 (Auto-AVSR)
Natural Language TransductiononLRS3-TED
Word Error Rate (WER)
37.1
best: 12.8 (LP + Conformer)