Audio-Visual Speech Recognition on CAS-VSR-S101

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset