TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Lip2Wav

Lip2Wav

Reported on 75 benchmarks across 4 tasks · 1 paper · 75 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio24 results

  • Speech RecognitiononLRW
    ESTOI· 2020-05-17
    0.344
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLRW
    PESQ· 2020-05-17
    1.197
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLRW
    STOI· 2020-05-17
    0.543
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (EH)
    ESTOI· 2020-05-17
    0.22
    best: 0.304 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (EH)
    PESQ· 2020-05-17
    1.367
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (EH)
    STOI· 2020-05-17
    0.369
    best: 0.463 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chess)
    ESTOI· 2020-05-17
    0.29
    best: 0.334 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chess)
    PESQ· 2020-05-17
    1.4
    best: 1.503 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chess)
    STOI· 2020-05-17
    0.418
    best: 0.506 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (DL)
    ESTOI· 2020-05-17
    0.183
    best: 0.402 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (DL)
    PESQ· 2020-05-17
    1.671
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (DL)
    STOI· 2020-05-17
    0.282
    best: 0.576 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (HS)
    ESTOI· 2020-05-17
    0.311
    best: 0.337 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (HS)
    PESQ· 2020-05-17
    1.29
    best: 1.366 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (HS)
    STOI· 2020-05-17
    0.446
    best: 0.504 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chem)
    ESTOI· 2020-05-17
    0.284
    best: 0.429 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chem)
    PESQ· 2020-05-17
    1.3
    best: 1.529 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononLip2Wav (Chem)
    STOI· 2020-05-17
    0.416
    best: 0.566 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    ESTOI· 2020-05-17
    36.5
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    PESQ· 2020-05-17
    1.35
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    STOI· 2020-05-17
    0.558
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononGRID corpus (mixed-speech)
    ESTOI· 2020-05-17
    0.535
    best: 0.579 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononGRID corpus (mixed-speech)
    PESQ· 2020-05-17
    1.772
    best: 1.984 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Speech RecognitiononGRID corpus (mixed-speech)
    STOI· 2020-05-17
    0.731
    best: 0.738 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209

Speech24 results

  • Visual Speech RecognitiononLRW
    ESTOI· 2020-05-17
    0.344
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLRW
    PESQ· 2020-05-17
    1.197
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLRW
    STOI· 2020-05-17
    0.543
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (EH)
    ESTOI· 2020-05-17
    0.22
    best: 0.304 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (EH)
    PESQ· 2020-05-17
    1.367
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (EH)
    STOI· 2020-05-17
    0.369
    best: 0.463 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chess)
    ESTOI· 2020-05-17
    0.29
    best: 0.334 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chess)
    PESQ· 2020-05-17
    1.4
    best: 1.503 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chess)
    STOI· 2020-05-17
    0.418
    best: 0.506 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (DL)
    ESTOI· 2020-05-17
    0.183
    best: 0.402 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (DL)
    PESQ· 2020-05-17
    1.671
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (DL)
    STOI· 2020-05-17
    0.282
    best: 0.576 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (HS)
    ESTOI· 2020-05-17
    0.311
    best: 0.337 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (HS)
    PESQ· 2020-05-17
    1.29
    best: 1.366 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (HS)
    STOI· 2020-05-17
    0.446
    best: 0.504 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chem)
    ESTOI· 2020-05-17
    0.284
    best: 0.429 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chem)
    PESQ· 2020-05-17
    1.3
    best: 1.529 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononLip2Wav (Chem)
    STOI· 2020-05-17
    0.416
    best: 0.566 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    ESTOI· 2020-05-17
    36.5
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    PESQ· 2020-05-17
    1.35
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononTCD-TIMIT corpus (mixed-speech)
    STOI· 2020-05-17
    0.558
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    ESTOI· 2020-05-17
    0.535
    best: 0.579 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    PESQ· 2020-05-17
    1.772
    best: 1.984 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    STOI· 2020-05-17
    0.731
    best: 0.738 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209

Computer Vision24 results

  • Lip to Speech SynthesisonLRW
    ESTOI· 2020-05-17
    0.344
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLRW
    PESQ· 2020-05-17
    1.197
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLRW
    STOI· 2020-05-17
    0.543
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (EH)
    ESTOI· 2020-05-17
    0.22
    best: 0.304 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (EH)
    PESQ· 2020-05-17
    1.367
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (EH)
    STOI· 2020-05-17
    0.369
    best: 0.463 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chess)
    ESTOI· 2020-05-17
    0.29
    best: 0.334 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chess)
    PESQ· 2020-05-17
    1.4
    best: 1.503 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chess)
    STOI· 2020-05-17
    0.418
    best: 0.506 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (DL)
    ESTOI· 2020-05-17
    0.183
    best: 0.402 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (DL)
    PESQ· 2020-05-17
    1.671
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (DL)
    STOI· 2020-05-17
    0.282
    best: 0.576 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (HS)
    ESTOI· 2020-05-17
    0.311
    best: 0.337 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (HS)
    PESQ· 2020-05-17
    1.29
    best: 1.366 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (HS)
    STOI· 2020-05-17
    0.446
    best: 0.504 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chem)
    ESTOI· 2020-05-17
    0.284
    best: 0.429 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chem)
    PESQ· 2020-05-17
    1.3
    best: 1.529 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonLip2Wav (Chem)
    STOI· 2020-05-17
    0.416
    best: 0.566 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonTCD-TIMIT corpus (mixed-speech)
    ESTOI· 2020-05-17
    36.5
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonTCD-TIMIT corpus (mixed-speech)
    PESQ· 2020-05-17
    1.35
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonTCD-TIMIT corpus (mixed-speech)
    STOI· 2020-05-17
    0.558
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    ESTOI· 2020-05-17
    0.535
    best: 0.579 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    PESQ· 2020-05-17
    1.772
    best: 1.984 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    STOI· 2020-05-17
    0.731
    best: 0.738 (Visual Voice Memory)
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209

Time Series3 results

  • Lip ReadingonTCD-TIMIT corpus (mixed-speech)
    WER· 2020-05-17
    31.26
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip ReadingonGRID corpus (mixed-speech)
    WER· 2020-05-17
    14.08
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209
  • Lip ReadingonLRW
    WER· 2020-05-17
    34.2
    SOTA
    Learning Individual Speaking Styles for Accurate Lip to Speech SynthesisarXiv:2005.08209