TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Lipreading using Temporal Convolutional Networks

Lipreading using Temporal Convolutional Networks

Brais Martinez, Pingchuan Ma, Stavros Petridis, Maja Pantic

2020-01-23LipreadingLip Reading
PaperPDFCodeCode

Abstract

Lip-reading has attracted a lot of research attention lately thanks to advances in deep learning. The current state-of-the-art model for recognition of isolated words in-the-wild consists of a residual network and Bidirectional Gated Recurrent Unit (BGRU) layers. In this work, we address the limitations of this model and we propose changes which further improve its performance. Firstly, the BGRU layers are replaced with Temporal Convolutional Networks (TCN). Secondly, we greatly simplify the training procedure, which allows us to train the model in one single stage. Thirdly, we show that the current state-of-the-art methodology produces models that do not generalize well to variations on the sequence length, and we addresses this issue by proposing a variable-length augmentation. We present results on the largest publicly-available datasets for isolated word recognition in English and Mandarin, LRW and LRW1000, respectively. Our proposed model results in an absolute improvement of 1.2% and 3.2%, respectively, in these datasets which is the new state-of-the-art performance.

Results

TaskDatasetMetricValueModel
LipreadingLip Reading in the WildTop-1 Accuracy85.33D Conv + ResNet-18 + MS-TCN
Natural Language TransductionLip Reading in the WildTop-1 Accuracy85.33D Conv + ResNet-18 + MS-TCN

Related Papers

VisualSpeaker: Visually-Guided 3D Avatar Lip Synthesis2025-07-08Learning Speaker-Invariant Visual Features for Lipreading2025-06-09UniCUE: Unified Recognition and Generation Framework for Chinese Cued Speech Video-to-Speech Generation2025-06-04OXSeg: Multidimensional attention UNet-based lip segmentation using semi-supervised lip contours2025-05-08SwinLip: An Efficient Visual Speech Encoder for Lip Reading Using Swin Transformer2025-05-07Transforming faces into video stories -- VideoFace2.02025-05-04Development and evaluation of a deep learning algorithm for German word recognition from lip movements2025-04-22Chinese-LiPS: A Chinese audio-visual speech recognition dataset with Lip-reading and Presentation Slides2025-04-21