ES³ Base*
Reported on 6 benchmarks across 4 tasks
Note: results are matched by exact model name. Different papers may use the same name for different model variants.
Computer Vision2 results
- Word Error Rate (WER)55.6
- Word Error Rate (WER)31.4best: 14.6 (Auto-AVSR)
Natural Language Processing2 results
- Word Error Rate (WER)55.6
- Word Error Rate (WER)31.4best: 14.6 (Auto-AVSR)
Audio1 result
- Word Error Rate (WER)11.6
Speech1 result
- Word Error Rate (WER)11