Metric: Top-1 Accuracy (higher is better)
| # | Model↕ | Top-1 Accuracy▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | SyncVSR (Word Boundary) | 95 | No | SyncVSR: Data-Efficient Visual Speech Recognitio... | 2024-06-18 | Code |
| 2 | 3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary) | 94.1 | Yes | Training Strategies for Improved Lip-reading | 2022-09-03 | Code |
| 3 | SyncVSR | 93.2 | No | SyncVSR: Data-Efficient Visual Speech Recognitio... | 2024-06-18 | Code |
| 4 | AVCRFormer | 89.57 | No | - | - | Code |
| 5 | 3D Conv + EfficientNetV2 + Transformer + TCN | 89.52 | No | - | - | - |
| 6 | Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR | 88.7 | No | - | - | - |
| 7 | 3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory | 88.5 | No | Distinguishing Homophenes Using Multi-Head Visua... | 2022-04-04 | Code |
| 8 | 3D Conv + ResNet-18 + MS-TCN + KD (Ensemble) | 88.5 | No | Towards Practical Lipreading with Distilled and ... | 2020-07-13 | Code |
| 9 | 3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary) | 88.4 | No | Learn an Effective Lip Reading Model without Pains | 2020-11-15 | Code |
| 10 | 3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR | 85.5 | No | Learn an Effective Lip Reading Model without Pains | 2020-11-15 | Code |
| 11 | 3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory | 85.4 | No | Multi-modality Associative Bridging through Memo... | 2022-04-04 | Code |
| 12 | 3D Conv + ResNet-18 + MS-TCN | 85.3 | No | Lipreading using Temporal Convolutional Networks | 2020-01-23 | Code |
| 13 | 3D Conv + ResNet-18 + Bi-GRU(Face Cutout) | 85.02 | No | Can We Read Speech Beyond the Lips? Rethinking R... | 2020-03-06 | Code |
| 14 | MoCo + Wav2Vec by SJTU LUMIA | 85 | No | Leveraging Unimodal Self-Supervised Learning for... | 2022-02-24 | Code |
| 15 | 3D Conv + P3D-ResNet50 + TCN | 84.8 | No | Discriminative Multi-modality Speech Recognition | 2020-05-12 | Code |
| 16 | 3D Conv + ResNet-18 + Bi-GRU | 84.41 | No | Mutual Information Maximization for Effective Li... | 2020-03-13 | Code |
| 17 | SpotFast + Transformer + Product-Key memory | 84.4 | No | SpotFast Networks with Memory Augmented Lateral ... | 2020-05-21 | Code |
| 18 | DFTN | 84.13 | No | Deformation Flow Based Two-Stream Network for Li... | 2020-03-12 | Code |
| 19 | PCPG | 83.5 | No | Pseudo-Convolutional Policy Gradient for Sequenc... | 2020-03-09 | - |
| 20 | 3D Conv + ResNet-34 + Bi-GRU | 83.39 | No | End-to-end Audiovisual Speech Recognition | 2018-02-18 | Code |
| 21 | Multi-grained + Bi-ConvLSTM | 83.34 | No | Multi-Grained Spatio-temporal Modeling for Lip-r... | 2019-08-30 | - |
| 22 | 3D Conv + ResNet-34 + Bi-LSTM | 83 | No | Combining Residual Networks with LSTMs for Lipre... | 2017-03-12 | Code |