TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Speech/Audio-Visual Speech Recognition/LRS3-TED

Audio-Visual Speech Recognition on LRS3-TED

Metric: Word Error Rate (WER) (lower is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Word Error Rate (WER)▲Extra DataPaperDate↕Code
1MMS-LLaMA0.74YesMMS-LLaMA: Efficient LLM-based Audio-Visual Spee...2025-03-14Code
2Whisper-Flamingo0.76YesWhisper-Flamingo: Integrating Visual Features in...2024-06-14Code
3Llama-AVSR0.77YesLarge Language Models are Strong Audio-Visual Sp...2024-09-18Code
4CTC/Attention0.9YesAuto-AVSR: Audio-Visual Speech Recognition with ...2023-03-25Code
5DistillAV1.3YesAudio-Visual Representation Learning via Knowled...2025-02-09Code
6AV-HuBERT Large1.4YesRobust Self-Supervised Audio-Visual Speech Recog...2022-01-05Code
7RAVEn Large1.4YesJointly Learning Visual and Auditory Speech Repr...2022-12-12Code
8Zero-AVSR1.5YesZero-AVSR: Zero-Shot Audio-Visual Speech Recogni...2025-03-08Code
9Hyb-Conformer2.3NoEnd-to-end Audio-visual Speech Recognition with ...2021-02-12Code
10RNN-T4.5YesRecurrent Neural Network Transducer for Audio-Vi...2019-11-08Code
11EG-seq2seq6.8YesDiscriminative Multi-modality Speech Recognition2020-05-12Code
12TM-seq2seq7.2YesDeep Audio-Visual Speech Recognition2018-09-06Code