ES³ Base*

Reported on 6 benchmarks across 4 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision2 results

LipreadingonCAS-VSR-S101
Word Error Rate (WER)
55.6
LipreadingonLRS2
Word Error Rate (WER)
31.4
best: 14.6 (Auto-AVSR)

Natural Language Processing2 results

Natural Language TransductiononCAS-VSR-S101
Word Error Rate (WER)
55.6
Natural Language TransductiononLRS2
Word Error Rate (WER)
31.4
best: 14.6 (Auto-AVSR)

Audio1 result

Speech RecognitiononCAS-VSR-S101
Word Error Rate (WER)
11.6

Speech1 result

Audio-Visual Speech RecognitiononCAS-VSR-S101
Word Error Rate (WER)
11