TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/VQA v2 val

Visual Question Answering (VQA) on VQA v2 val

Metric: Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Accuracy▼Extra DataPaperDate↕Code
1BLIP-2 ViT-G OPT 6.7B (fine-tuned)82.19NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
2BLIP-2 ViT-G OPT 2.7B (fine-tuned)81.59NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
3BLIP-2 ViT-G FlanT5 XL (fine-tuned)81.55NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
4BLIP-2 ViT-G FlanT5 XXL (zero-shot)65.2NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
5PNP-VQA63.3NoPlug-and-Play VQA: Zero-shot VQA by Conjoining L...2022-10-17Code
6BLIP-2 ViT-G FlanT5 XL (zero-shot)63.1NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
7BLIP-2 ViT-L FlanT5 XL (zero-shot)62.6NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
8LocVLM-L55.9NoLearning to Localize Objects Improves Spatial Re...2024-04-11Code
9BLIP-2 ViT-G OPT 6.7B (zero-shot)54.3NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
10BLIP-2 ViT-G OPT 2.7B (zero-shot)53.5NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
11BLIP-2 ViT-L OPT 2.7B (zero-shot)50.1NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
12Few VLM (zero-shot)47.7NoA Good Prompt Is Worth Millions of Parameters: L...2021-10-16Code
13MetaLM41.1NoLanguage Models are General-Purpose Interfaces2022-06-13Code
14VLKD(ViT-B/16)38.6No---
15Frozen29.5NoMultimodal Few-Shot Learning with Frozen Languag...2021-06-25-