Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Natural Language Processing
/
Natural Language Transduction
/
LRS3-TED
Natural Language Transduction on LRS3-TED
Metric: Word Error Rate (WER) (lower is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
#
Model
↕
Word Error Rate (WER)
▲
Extra Data
Paper
Date
↕
Code
1
LP + Conformer
12.8
Yes
Conformers are All You Need for Visual Speech Re...
2023-02-17
-
2
Auto-AVSR
19.1
Yes
Auto-AVSR: Audio-Visual Speech Recognition with ...
2023-03-25
Code
3
SyncVSR
21.5
Yes
SyncVSR: Data-Efficient Visual Speech Recognitio...
2024-06-18
Code
4
USR (self + semi-supervised)
21.5
Yes
Unified Speech Recognition: A Single Model for A...
2024-11-04
Code
5
USR (self-supervised)
22.3
Yes
Unified Speech Recognition: A Single Model for A...
2024-11-04
Code
6
RAVEn Large
23.4
Yes
Jointly Learning Visual and Auditory Speech Repr...
2022-12-12
Code
7
VSP-LLM
25.4
Yes
Where Visual Speech Meets Language: VSP-LLM Fram...
2024-02-23
Code
8
AV-HuBERT Large + Relaxed Attention + LM
25.51
Yes
Relaxed Attention for Transformer Models
2022-09-20
Code
9
DistillAV
26.2
Yes
Audio-Visual Representation Learning via Knowled...
2025-02-09
Code
10
AV-HuBERT Large
26.9
Yes
Learning Audio-Visual Speech Representation by M...
2022-01-05
Code
11
VTP (more data)
30.7
Yes
Sub-word Level Lip Reading With Visual Attention
2021-10-14
-
12
SyncVSR
31.2
No
SyncVSR: Data-Efficient Visual Speech Recognitio...
2024-06-18
Code
13
CTC/Attention (LRW+LRS2/3+AVSpeech)
31.5
Yes
Visual Speech Recognition for Multiple Languages...
2022-02-26
Code
14
RNN-T
33.6
Yes
Recurrent Neural Network Transducer for Audio-Vi...
2019-11-08
Code
15
ES³ Large
37.1
No
-
-
-
16
ES³ Base
40.3
No
-
-
-
17
VTP
40.6
Yes
Sub-word Level Lip Reading With Visual Attention
2021-10-14
-
18
Hyb + Conformer
43.3
Yes
End-to-end Audio-visual Speech Recognition with ...
2021-02-12
Code
19
CTC-V2P
55.1
Yes
Large-Scale Visual Speech Recognition
2018-07-13
-
20
EG-seq2seq
57.8
No
Discriminative Multi-modality Speech Recognition
2020-05-12
Code
21
TM-seq2seq
58.9
Yes
Deep Audio-Visual Speech Recognition
2018-09-06
Code
22
CTC + KD
59.8
Yes
ASR is all you need: cross-modal distillation fo...
2019-11-28
-
23
Conv-seq2seq
60.1
Yes
-
-
-