TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/ViLT

ViLT

Reported on 34 benchmarks across 10 tasks · 5 papers · 6 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing22 results

  • Common Sense ReasoningonWinoGAViL
    Jaccard Index· 2022-07-25
    52
    SOTA
    WinoGAViL: Gamified Association Benchmark to Challenge Vision-and-Language ModelsarXiv:2207.12576
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Geo.)· 2021-10-25
    82.61
    SOTA
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Reading ComprehensiononMMDialog
    F1· 2021-02-05
    55.8
    best: 77.6 (PaCE)
    SOTA
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Inductive knowledge graph completiononMARS (Multimodal Analogical Reasoning dataSet)
    MRR· 2022-10-01
    0.257
    best: 0.341 (MarT_MKGformer)
    Multimodal Analogical Reasoning over Knowledge GraphsarXiv:2210.00312
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Alg.)· 2021-10-25
    50.55
    best: 56.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Com.)· 2021-10-25
    84.95
    best: 87 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Cou.)· 2021-10-25
    71.13
    best: 77.81 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Est.)· 2021-10-25
    99.02
    best: 99.54 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Fra.)· 2021-10-25
    75.81
    best: 82.13 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Mea.)· 2021-10-25
    98.91
    best: 99.46 (Top-Down)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pat.)· 2021-10-25
    59.22
    best: 68.75 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Pro.)· 2021-10-25
    87.65
    best: 95.73 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sce.)· 2021-10-25
    66.72
    best: 68.8 (ViT)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Sen.)· 2021-10-25
    86.1
    best: 92.49 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Spa.)· 2021-10-25
    53.38
    best: 55.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Reasoning (Tim.)· 2021-10-25
    69.99
    best: 77.98 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Blank)· 2021-10-25
    79.27
    best: 83.62 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Img.)· 2021-10-25
    79.67
    best: 82.66 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Visual Question Answering (VQA)onIconQA
    Sub-tasks (Txt.)· 2021-10-25
    72.69
    best: 75.19 (Patch-TRM)
    IconQA: A New Benchmark for Abstract Diagram Understanding and Visual Language ReasoningarXiv:2110.13214
  • Reading ComprehensiononPhotoChat
    F1· 2021-02-05
    52.4
    best: 63.8 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Reading ComprehensiononPhotoChat
    Precision· 2021-02-05
    55.4
    best: 63.3 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Reading ComprehensiononPhotoChat
    Recall· 2021-02-05
    58.9
    best: 68 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334

Computer Vision4 results

  • Image RetrievalonPhotoChat
    R1· 2021-02-05
    11.5
    best: 15.2 (PaCE)
    SOTA
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Image RetrievalonPhotoChat
    R@5· 2021-02-05
    33.8
    best: 36.7 (PaCE)
    SOTA
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Image RetrievalonPhotoChat
    R@10· 2021-02-05
    25.6
    best: 49.6 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Image RetrievalonPhotoChat
    Sum(R@1,5,10)· 2021-02-05
    71
    best: 101.5 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334

Miscellaneous4 results

  • Intent RecognitiononMMDialog
    F1· 2021-02-05
    55.8
    best: 77.6 (PaCE)
    SOTA
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Intent RecognitiononPhotoChat
    F1· 2021-02-05
    52.4
    best: 63.8 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Intent RecognitiononPhotoChat
    Precision· 2021-02-05
    55.4
    best: 63.3 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334
  • Intent RecognitiononPhotoChat
    Recall· 2021-02-05
    58.9
    best: 68 (PaCE)
    ViLT: Vision-and-Language Transformer Without Convolution or Region SupervisionarXiv:2102.03334

Knowledge Base2 results

  • Knowledge GraphsonMARS (Multimodal Analogical Reasoning dataSet)
    MRR· 2022-10-01
    0.257
    best: 0.341 (MarT_MKGformer)
    Multimodal Analogical Reasoning over Knowledge GraphsarXiv:2210.00312
  • Knowledge Graph CompletiononMARS (Multimodal Analogical Reasoning dataSet)
    MRR· 2022-10-01
    0.257
    best: 0.341 (MarT_MKGformer)
    Multimodal Analogical Reasoning over Knowledge GraphsarXiv:2210.00312

Methodology1 result

  • Large Language ModelonMARS (Multimodal Analogical Reasoning dataSet)
    MRR· 2022-10-01
    0.257
    best: 0.341 (MarT_MKGformer)
    Multimodal Analogical Reasoning over Knowledge GraphsarXiv:2210.00312

Reasoning1 result

  • Visual ReasoningonVSR
    accuracy· 2022-04-30
    69.3
    best: 70.1 (LXMERT)
    Visual Spatial ReasoningarXiv:2205.00363