TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/VLMo

VLMo

Reported on 14 benchmarks across 4 tasks · 1 paper · 9 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing5 results

  • Visual Question Answering (VQA)onVQA v2 test-dev
    Accuracy· 2021-11-03
    82.78
    best: 84.3 (PaLI)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Visual Question Answering (VQA)onVQA v2 test-std
    number· 2021-11-03
    67.26
    best: 72.24 (ONE-PEACE)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Visual Question Answering (VQA)onVQA v2 test-std
    other· 2021-11-03
    72.87
    best: 77.02 (mPLUG-Huge)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Visual Question Answering (VQA)onVQA v2 test-std
    overall· 2021-11-03
    81.3
    best: 84.03 (BEiT-3)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Visual Question Answering (VQA)onVQA v2 test-std
    yes/no· 2021-11-03
    94.68
    best: 94.85 (ONE-PEACE)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358

Computer Vision4 results

  • Image RetrievalonPhotoChat
    R@10· 2021-11-03
    39.4
    best: 49.6 (PaCE)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Image RetrievalonPhotoChat
    Sum(R@1,5,10)· 2021-11-03
    83.2
    best: 101.5 (PaCE)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Image RetrievalonPhotoChat
    R1· 2021-11-03
    11.5
    best: 15.2 (PaCE)
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Image RetrievalonPhotoChat
    R@5· 2021-11-03
    30
    best: 36.7 (PaCE)
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358

Methodology3 results

  • RetrievalonImage-Chat
    R@1· 2021-11-03
    46.8
    best: 51.9 (PaCE)
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • RetrievalonImage-Chat
    R@5· 2021-11-03
    67.5
    best: 76.8 (PaCE)
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • RetrievalonImage-Chat
    Sum(R@1,5)· 2021-11-03
    114.3
    best: 128.7 (PaCE)
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358

Reasoning2 results

  • Visual ReasoningonNLVR2 Dev
    Accuracy· 2021-11-03
    85.64
    best: 91.51 (BEiT-3)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358
  • Visual ReasoningonNLVR2 Test
    Accuracy· 2021-11-03
    86.86
    best: 92.58 (BEiT-3)
    SOTA
    VLMo: Unified Vision-Language Pre-Training with Mixture-of-Modality-ExpertsarXiv:2111.02358