TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Natural Language Transduction/LRS2

Natural Language Transduction on LRS2

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Word Error Rate (WER)▲Extra DataPaperDate↕Code
1Auto-AVSR14.6YesAuto-AVSR: Audio-Visual Speech Recognition with ...2023-03-25Code
2USR15.4YesUnified Speech Recognition: A Single Model for A...2024-11-04Code
3SyncVSR16.5YesSyncVSR: Data-Efficient Visual Speech Recognitio...2024-06-18Code
4RAVEn Large18.6YesJointly Learning Visual and Auditory Speech Repr...2022-12-12Code
5VTP (more data)22.6YesSub-word Level Lip Reading With Visual Attention2021-10-14-
6ES³ Large + extLM24.6Yes---
7CTC/Attention (LRW+LRS2/3+AVSpeech)25.5YesVisual Speech Recognition for Multiple Languages...2022-02-26Code
8ES³ Large26.7Yes---
9ES³ Base + extLM28.7Yes---
10VTP28.9YesSub-word Level Lip Reading With Visual Attention2021-10-14-
11SyncVSR28.9NoSyncVSR: Data-Efficient Visual Speech Recognitio...2024-06-18Code
12ES³ Base* + extLM29.3No---
13ES³ Base30.7Yes---
14ES³ Base*31.4No---
15CTC/Attention32.9NoVisual Speech Recognition for Multiple Languages...2022-02-26Code
16Hybrid CTC / Attention39.1NoEnd-to-end Audio-visual Speech Recognition with ...2021-02-12Code
17MoCo + wav2vec (w/o extLM)43.2NoLeveraging Unimodal Self-Supervised Learning for...2022-02-24Code
18Multi-head Visual-Audio Memory44.5YesDistinguishing Homophenes Using Multi-Head Visua...2022-04-04Code
19TM-seq2seq + extLM48.3YesDeep Audio-Visual Speech Recognition2018-09-06Code
20LF-MMI TDNN48.86YesAudio-visual Recognition of Overlapped speech fo...2020-01-06-
21Hybrid CTC / Attention50NoAudio-Visual Speech Recognition With A Hybrid CT...2018-09-28-
22Conv-seq2seq51.7Yes---
23CTC + KD ASR53.2YesASR is all you need: cross-modal distillation fo...2019-11-28-
24TM-CTC + extLM54.7YesDeep Audio-Visual Speech Recognition2018-09-06Code
25LIBS65.29NoHearing Lips: Improving Lip Reading by Distillin...2019-11-26Code
26SyncVSR74.6NoSyncVSR: Data-Efficient Visual Speech Recognitio...2024-06-18Code