TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/PaLI-X-VPD

PaLI-X-VPD

Reported on 7 benchmarks across 3 tasks · 1 paper · 2 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing5 results

  • Visual Question Answering (VQA)onOK-VQA
    Accuracy· 2023-12-05
    66.8
    SOTA
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052
  • Meme ClassificationonHateful Memes
    ROC-AUC· 2023-12-05
    0.892
    best: 0.911 (RA-HMD (Qwen2-VL-7B))
    SOTA
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052
  • Visual Question Answering (VQA)onA-OKVQA
    DA VQA Score· 2023-12-05
    68.2
    best: 70.55 (SMoLA-PaLI-X Specialist Model)
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052
  • Visual Question Answering (VQA)onA-OKVQA
    MC Accuracy· 2023-12-05
    80.4
    best: 83.75 (SMoLA-PaLI-X Specialist Model)
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052
  • Visual Question Answering (VQA)onGQA test-dev
    Accuracy· 2023-12-05
    67.3
    best: 72.1 (CFR)
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052

Computer Vision2 results

  • Object CountingonTallyQA-Complex
    Accuracy· 2023-12-05
    76.6
    best: 77.1 (SMoLA-PaLI-X Specialist)
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052
  • Object CountingonTallyQA-Simple
    Accuracy· 2023-12-05
    86.2
    best: 86.3 (SMoLA-PaLI-X Specialist)
    Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsarXiv:2312.03052