Metric: Test WER (lower is better)
| # | Model↕ | Test WER▲ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | Whisper-Flamingo | 1.4 | Yes | Whisper-Flamingo: Integrating Visual Features in... | 2024-06-14 | Code |
| 2 | CTC/Attention | 1.5 | Yes | Auto-AVSR: Audio-Visual Speech Recognition with ... | 2023-03-25 | Code |
| 3 | MoCo + wav2vec (w/o extLM) | 2.6 | No | Leveraging Unimodal Self-Supervised Learning for... | 2022-02-24 | Code |
| 4 | End2end Conformer | 3.7 | No | End-to-end Audio-visual Speech Recognition with ... | 2021-02-12 | Code |
| 5 | LF-MMI TDNN | 5.9 | No | Audio-visual Recognition of Overlapped speech fo... | 2020-01-06 | - |
| 6 | CTC/Attention | 7 | No | Audio-Visual Speech Recognition With A Hybrid CT... | 2018-09-28 | - |
| 7 | TM-CTC | 8.2 | No | Deep Audio-Visual Speech Recognition | 2018-09-06 | Code |
| 8 | TM-Seq2seq | 8.5 | No | Deep Audio-Visual Speech Recognition | 2018-09-06 | Code |