TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/Single Model

Single Model

Reported on 39 benchmarks across 2 tasks · 2 papers · 14 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing39 results

  • Image Captioningonnocaps entire
    B1· 2021-08-24
    83.78
    best: 88.1 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    B2· 2021-08-24
    68.86
    best: 74.81 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    B3· 2021-08-24
    51.06
    best: 57.68 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    B4· 2021-08-24
    32.2
    best: 37.71 (CoCa - Google Brain)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    CIDEr· 2021-08-24
    110.31
    best: 126.8 (Lyrics)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    METEOR· 2021-08-24
    30.55
    best: 32.5 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    ROUGE-L· 2021-08-24
    59.86
    best: 63.12 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps entire
    SPICE· 2021-08-24
    14.49
    best: 15.94 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    SPICE· 2021-08-24
    13.89
    best: 15.7 (GIT, Single Model)
    SOTA
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Visual Question Answering (VQA)onGQA Test2019
    Accuracy· 2021-01-02
    64.65
    best: 89.3 (human)
    SOTA
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Visual Question Answering (VQA)onGQA Test2019
    Binary· 2021-01-02
    82.63
    best: 91.2 (human)
    SOTA
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Visual Question Answering (VQA)onGQA Test2019
    Consistency· 2021-01-02
    94.35
    best: 98.4 (human)
    SOTA
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Visual Question Answering (VQA)onGQA Test2019
    Open· 2021-01-02
    48.77
    best: 87.4 (human)
    SOTA
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Visual Question Answering (VQA)onGQA Test2019
    Validity· 2021-01-02
    96.62
    best: 98.9 (human)
    SOTA
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Image Captioningonnocaps near-domain
    B1· 2021-08-24
    84.36
    best: 88.9 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    B2· 2021-08-24
    69.83
    best: 75.86 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    B3· 2021-08-24
    52.42
    best: 58.99 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    B4· 2021-08-24
    33.74
    best: 39.98 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    CIDEr· 2021-08-24
    110.76
    best: 125.51 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    METEOR· 2021-08-24
    30.97
    best: 33.47 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    ROUGE-L· 2021-08-24
    60.46
    best: 63.99 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps near-domain
    SPICE· 2021-08-24
    14.61
    best: 16.11 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    B1· 2021-08-24
    80.89
    best: 86.28 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    B2· 2021-08-24
    64.21
    best: 71.28 (GIT, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    B3· 2021-08-24
    44.38
    best: 52.66 (GIT, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    B4· 2021-08-24
    24.47
    best: 32 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    CIDEr· 2021-08-24
    109.49
    best: 126.67 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    METEOR· 2021-08-24
    27.91
    best: 30.99 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps out-of-domain
    ROUGE-L· 2021-08-24
    56.69
    best: 61.35 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    B1· 2021-08-24
    84.64
    best: 88.86 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    B2· 2021-08-24
    70
    best: 76.1 (GIT, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    B3· 2021-08-24
    52.96
    best: 60.53 (GIT, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    B4· 2021-08-24
    34.66
    best: 41.65 (GIT, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    CIDEr· 2021-08-24
    108.98
    best: 149.1 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    METEOR· 2021-08-24
    31.97
    best: 34.22 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    ROUGE-L· 2021-08-24
    61.01
    best: 64.39 (PaLI)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Image Captioningonnocaps in-domain
    SPICE· 2021-08-24
    14.6
    best: 16.36 (GIT2, Single Model)
    SimVLM: Simple Visual Language Model Pretraining with Weak SupervisionarXiv:2108.10904
  • Visual Question Answering (VQA)onGQA Test2019
    Distribution· 2021-01-02
    4.72
    best: 93.08 (GlobalPrior)
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529
  • Visual Question Answering (VQA)onGQA Test2019
    Plausibility· 2021-01-02
    84.98
    best: 97.2 (human)
    VinVL: Revisiting Visual Representations in Vision-Language ModelsarXiv:2101.00529