Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Transduction
/
Lip Reading in the Wild
Natural Language Transduction on Lip Reading in the Wild
Metric: Top-1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top-1 Accuracy (best first)
Top-1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
SyncVSR (Word Boundary)
95
No
SyncVSR: Data-Efficient Visual Speech Recognitio...
2024-06-18
Code
2
3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)
94.1
Yes
Training Strategies for Improved Lip-reading
2022-09-03
Code
3
SyncVSR
93.2
No
SyncVSR: Data-Efficient Visual Speech Recognitio...
2024-06-18
Code
4
AVCRFormer
89.57
No
-
-
Code
5
3D Conv + EfficientNetV2 + Transformer + TCN
89.52
No
-
-
-
6
Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR
88.7
No
-
-
-
7
3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory
88.5
No
Distinguishing Homophenes Using Multi-Head Visua...
2022-04-04
Code
8
3D Conv + ResNet-18 + MS-TCN + KD (Ensemble)
88.5
No
Towards Practical Lipreading with Distilled and ...
2020-07-13
Code
9
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)
88.4
No
Learn an Effective Lip Reading Model without Pains
2020-11-15
Code
10
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR
85.5
No
Learn an Effective Lip Reading Model without Pains
2020-11-15
Code
11
3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory
85.4
No
Multi-modality Associative Bridging through Memo...
2022-04-04
Code
12
3D Conv + ResNet-18 + MS-TCN
85.3
No
Lipreading using Temporal Convolutional Networks
2020-01-23
Code
13
3D Conv + ResNet-18 + Bi-GRU(Face Cutout)
85.02
No
Can We Read Speech Beyond the Lips? Rethinking R...
2020-03-06
Code
14
MoCo + Wav2Vec by SJTU LUMIA
85
No
Leveraging Unimodal Self-Supervised Learning for...
2022-02-24
Code
15
3D Conv + P3D-ResNet50 + TCN
84.8
No
Discriminative Multi-modality Speech Recognition
2020-05-12
Code
16
3D Conv + ResNet-18 + Bi-GRU
84.41
No
Mutual Information Maximization for Effective Li...
2020-03-13
Code
17
SpotFast + Transformer + Product-Key memory
84.4
No
SpotFast Networks with Memory Augmented Lateral ...
2020-05-21
Code
18
DFTN
84.13
No
Deformation Flow Based Two-Stream Network for Li...
2020-03-12
Code
19
PCPG
83.5
No
Pseudo-Convolutional Policy Gradient for Sequenc...
2020-03-09
-
20
3D Conv + ResNet-34 + Bi-GRU
83.39
No
End-to-end Audiovisual Speech Recognition
2018-02-18
Code
21
Multi-grained + Bi-ConvLSTM
83.34
No
Multi-Grained Spatio-temporal Modeling for Lip-r...
2019-08-30
-
22
3D Conv + ResNet-34 + Bi-LSTM
83
No
Combining Residual Networks with LSTMs for Lipre...
2017-03-12
Code
#1
SyncVSR (Word Boundary)
SOTA
95
Top-1 Accuracy
· 2024-06-18
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Code
#2
3D Conv + ResNet-18 + DC-TCN + KD (Ensemble & Word Boundary)
SOTA
94.1
Top-1 Accuracy
· Extra Data
· 2022-09-03
Training Strategies for Improved Lip-reading
Code
#3
SyncVSR
93.2
Top-1 Accuracy
· 2024-06-18
SyncVSR: Data-Efficient Visual Speech Recognition with End-to-End Crossmodal Audio Token Synchronization
Code
#4
AVCRFormer
89.57
Top-1 Accuracy
No paper
Code
#5
3D Conv + EfficientNetV2 + Transformer + TCN
89.52
Top-1 Accuracy
No paper
#6
Vosk + MediaPipe + LS + MixUp + SA + 3DResNet-18 + BiLSTM + Cosine WR
88.7
Top-1 Accuracy
No paper
#7
3D Conv + ResNet-18 + MS-TCN + Multi-Head Visual-Audio Memory
88.5
Top-1 Accuracy
· 2022-04-04
Distinguishing Homophenes Using Multi-Head Visual-Audio Memory for Lip Reading
Code
#8
3D Conv + ResNet-18 + MS-TCN + KD (Ensemble)
SOTA
88.5
Top-1 Accuracy
· 2020-07-13
Towards Practical Lipreading with Distilled and Efficient Models
Code
#9
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR (Word Boundary)
88.4
Top-1 Accuracy
· 2020-11-15
Learn an Effective Lip Reading Model without Pains
Code
#10
3D-ResNet + Bi-GRU + MixUp + Label Smoothing + Cosine LR
85.5
Top-1 Accuracy
· 2020-11-15
Learn an Effective Lip Reading Model without Pains
Code
#11
3D Conv + ResNet-18 + Bi-GRU + Visual-Audio Memory
85.4
Top-1 Accuracy
· 2022-04-04
Multi-modality Associative Bridging through Memory: Speech Sound Recollected from Face Video
Code
#12
3D Conv + ResNet-18 + MS-TCN
SOTA
85.3
Top-1 Accuracy
· 2020-01-23
Lipreading using Temporal Convolutional Networks
Code
#13
3D Conv + ResNet-18 + Bi-GRU(Face Cutout)
85.02
Top-1 Accuracy
· 2020-03-06
Can We Read Speech Beyond the Lips? Rethinking RoI Selection for Deep Visual Speech Recognition
Code
#14
MoCo + Wav2Vec by SJTU LUMIA
85
Top-1 Accuracy
· 2022-02-24
Leveraging Unimodal Self-Supervised Learning for Multimodal Audio-Visual Speech Recognition
Code
#15
3D Conv + P3D-ResNet50 + TCN
84.8
Top-1 Accuracy
· 2020-05-12
Discriminative Multi-modality Speech Recognition
Code
#16
3D Conv + ResNet-18 + Bi-GRU
84.41
Top-1 Accuracy
· 2020-03-13
Mutual Information Maximization for Effective Lip Reading
Code
#17
SpotFast + Transformer + Product-Key memory
84.4
Top-1 Accuracy
· 2020-05-21
SpotFast Networks with Memory Augmented Lateral Transformers for Lipreading
Code
#18
DFTN
84.13
Top-1 Accuracy
· 2020-03-12
Deformation Flow Based Two-Stream Network for Lip Reading
Code
#19
PCPG
83.5
Top-1 Accuracy
· 2020-03-09
Pseudo-Convolutional Policy Gradient for Sequence-to-Sequence Lip-Reading
#20
3D Conv + ResNet-34 + Bi-GRU
SOTA
83.39
Top-1 Accuracy
· 2018-02-18
End-to-end Audiovisual Speech Recognition
Code
#21
Multi-grained + Bi-ConvLSTM
83.34
Top-1 Accuracy
· 2019-08-30
Multi-Grained Spatio-temporal Modeling for Lip-reading
#22
3D Conv + ResNet-34 + Bi-LSTM
SOTA
83
Top-1 Accuracy
· 2017-03-12
Combining Residual Networks with LSTMs for Lipreading
Code