TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Visual Voice Memory

Visual Voice Memory

Reported on 54 benchmarks across 3 tasks

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Audio18 results

  • Speech RecognitiononLip2Wav (EH)
    ESTOI
    0.304
  • Speech RecognitiononLip2Wav (EH)
    PESQ
    1.362
    best: 1.367 (Lip2Wav)
  • Speech RecognitiononLip2Wav (EH)
    STOI
    0.463
  • Speech RecognitiononLip2Wav (Chess)
    ESTOI
    0.334
  • Speech RecognitiononLip2Wav (Chess)
    PESQ
    1.503
  • Speech RecognitiononLip2Wav (Chess)
    STOI
    0.506
  • Speech RecognitiononLip2Wav (DL)
    ESTOI
    0.402
  • Speech RecognitiononLip2Wav (DL)
    PESQ
    1.612
    best: 1.671 (Lip2Wav)
  • Speech RecognitiononLip2Wav (DL)
    STOI
    0.576
  • Speech RecognitiononLip2Wav (HS)
    ESTOI
    0.337
  • Speech RecognitiononLip2Wav (HS)
    PESQ
    1.366
  • Speech RecognitiononLip2Wav (HS)
    STOI
    0.504
  • Speech RecognitiononLip2Wav (Chem)
    ESTOI
    0.429
  • Speech RecognitiononLip2Wav (Chem)
    PESQ
    1.529
  • Speech RecognitiononLip2Wav (Chem)
    STOI
    0.566
  • Speech RecognitiononGRID corpus (mixed-speech)
    ESTOI
    0.579
  • Speech RecognitiononGRID corpus (mixed-speech)
    PESQ
    1.984
  • Speech RecognitiononGRID corpus (mixed-speech)
    STOI
    0.738

Speech18 results

  • Visual Speech RecognitiononLip2Wav (EH)
    ESTOI
    0.304
  • Visual Speech RecognitiononLip2Wav (EH)
    PESQ
    1.362
    best: 1.367 (Lip2Wav)
  • Visual Speech RecognitiononLip2Wav (EH)
    STOI
    0.463
  • Visual Speech RecognitiononLip2Wav (Chess)
    ESTOI
    0.334
  • Visual Speech RecognitiononLip2Wav (Chess)
    PESQ
    1.503
  • Visual Speech RecognitiononLip2Wav (Chess)
    STOI
    0.506
  • Visual Speech RecognitiononLip2Wav (DL)
    ESTOI
    0.402
  • Visual Speech RecognitiononLip2Wav (DL)
    PESQ
    1.612
    best: 1.671 (Lip2Wav)
  • Visual Speech RecognitiononLip2Wav (DL)
    STOI
    0.576
  • Visual Speech RecognitiononLip2Wav (HS)
    ESTOI
    0.337
  • Visual Speech RecognitiononLip2Wav (HS)
    PESQ
    1.366
  • Visual Speech RecognitiononLip2Wav (HS)
    STOI
    0.504
  • Visual Speech RecognitiononLip2Wav (Chem)
    ESTOI
    0.429
  • Visual Speech RecognitiononLip2Wav (Chem)
    PESQ
    1.529
  • Visual Speech RecognitiononLip2Wav (Chem)
    STOI
    0.566
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    ESTOI
    0.579
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    PESQ
    1.984
  • Visual Speech RecognitiononGRID corpus (mixed-speech)
    STOI
    0.738

Computer Vision18 results

  • Lip to Speech SynthesisonLip2Wav (EH)
    ESTOI
    0.304
  • Lip to Speech SynthesisonLip2Wav (EH)
    PESQ
    1.362
    best: 1.367 (Lip2Wav)
  • Lip to Speech SynthesisonLip2Wav (EH)
    STOI
    0.463
  • Lip to Speech SynthesisonLip2Wav (Chess)
    ESTOI
    0.334
  • Lip to Speech SynthesisonLip2Wav (Chess)
    PESQ
    1.503
  • Lip to Speech SynthesisonLip2Wav (Chess)
    STOI
    0.506
  • Lip to Speech SynthesisonLip2Wav (DL)
    ESTOI
    0.402
  • Lip to Speech SynthesisonLip2Wav (DL)
    PESQ
    1.612
    best: 1.671 (Lip2Wav)
  • Lip to Speech SynthesisonLip2Wav (DL)
    STOI
    0.576
  • Lip to Speech SynthesisonLip2Wav (HS)
    ESTOI
    0.337
  • Lip to Speech SynthesisonLip2Wav (HS)
    PESQ
    1.366
  • Lip to Speech SynthesisonLip2Wav (HS)
    STOI
    0.504
  • Lip to Speech SynthesisonLip2Wav (Chem)
    ESTOI
    0.429
  • Lip to Speech SynthesisonLip2Wav (Chem)
    PESQ
    1.529
  • Lip to Speech SynthesisonLip2Wav (Chem)
    STOI
    0.566
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    ESTOI
    0.579
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    PESQ
    1.984
  • Lip to Speech SynthesisonGRID corpus (mixed-speech)
    STOI
    0.738