TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/GQA test-dev

Visual Question Answering (VQA) on GQA test-dev

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1CFR72.1NoCoarse-to-Fine Reasoning for Visual Question Ans...2021-10-06Code
2PaLI-X-VPD67.3NoVisual Program Distillation: Distilling Tools an...2023-12-05-
3CuMo-7B64.9YesCuMo: Scaling Multimodal LLM with Co-Upcycled Mi...2024-05-09Code
4Video-LaVIT64.4NoVideo-LaVIT: Unified Video-Language Pre-training...2024-02-05Code
5NSM62.95NoLearning by Abstraction: The Neural State Machine2019-07-09Code
6Lyrics62.4NoLyrics: Boosting Fine-grained Language-Vision Al...2023-12-08-
7LXMERT (Pre-train + scratch)60NoLXMERT: Learning Cross-Modality Encoder Represen...2019-08-20Code
8single-hop + LCGN (ours)55.8NoLanguage-Conditioned Graph Networks for Relation...2019-05-10Code
9HYDRA47.9NoHYDRA: A Hyper Agent for Dynamic Compositional V...2024-03-19Code
10BLIP-2 ViT-G FlanT5 XXL (zero-shot)44.7NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
11BLIP-2 ViT-L FlanT5 XL (zero-shot)44.4NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
12BLIP-2 ViT-G FlanT5 XL (zero-shot)44.2NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
13PNP-VQA41.9NoPlug-and-Play VQA: Zero-shot VQA by Conjoining L...2022-10-17Code
14BLIP-2 ViT-G OPT 6.7B (zero-shot)36.4NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
15BLIP-2 ViT-G OPT 2.7B (zero-shot)34.6NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
16BLIP-2 ViT-L OPT 2.7B (zero-shot)33.9NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
17FewVLM (zero-shot)29.3NoA Good Prompt Is Worth Millions of Parameters: L...2021-10-16Code