TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Transduction/LRS3-TED

Natural Language Transduction on LRS3-TED

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Word Error Rate (WER)▲Extra DataPaperDate↕Code
1LP + Conformer12.8YesConformers are All You Need for Visual Speech Re...2023-02-17-
2Auto-AVSR19.1YesAuto-AVSR: Audio-Visual Speech Recognition with ...2023-03-25Code
3SyncVSR21.5YesSyncVSR: Data-Efficient Visual Speech Recognitio...2024-06-18Code
4USR (self + semi-supervised)21.5YesUnified Speech Recognition: A Single Model for A...2024-11-04Code
5USR (self-supervised)22.3YesUnified Speech Recognition: A Single Model for A...2024-11-04Code
6RAVEn Large23.4YesJointly Learning Visual and Auditory Speech Repr...2022-12-12Code
7VSP-LLM25.4YesWhere Visual Speech Meets Language: VSP-LLM Fram...2024-02-23Code
8AV-HuBERT Large + Relaxed Attention + LM25.51YesRelaxed Attention for Transformer Models2022-09-20Code
9DistillAV26.2YesAudio-Visual Representation Learning via Knowled...2025-02-09Code
10AV-HuBERT Large26.9YesLearning Audio-Visual Speech Representation by M...2022-01-05Code
11VTP (more data)30.7YesSub-word Level Lip Reading With Visual Attention2021-10-14-
12SyncVSR31.2NoSyncVSR: Data-Efficient Visual Speech Recognitio...2024-06-18Code
13CTC/Attention (LRW+LRS2/3+AVSpeech)31.5YesVisual Speech Recognition for Multiple Languages...2022-02-26Code
14RNN-T33.6YesRecurrent Neural Network Transducer for Audio-Vi...2019-11-08Code
15ES³ Large37.1No---
16ES³ Base40.3No---
17VTP40.6YesSub-word Level Lip Reading With Visual Attention2021-10-14-
18Hyb + Conformer43.3YesEnd-to-end Audio-visual Speech Recognition with ...2021-02-12Code
19CTC-V2P55.1YesLarge-Scale Visual Speech Recognition2018-07-13-
20EG-seq2seq57.8NoDiscriminative Multi-modality Speech Recognition2020-05-12Code
21TM-seq2seq58.9YesDeep Audio-Visual Speech Recognition2018-09-06Code
22CTC + KD59.8YesASR is all you need: cross-modal distillation fo...2019-11-28-
23Conv-seq2seq60.1Yes---