TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/A-OKVQA

Visual Question Answering (VQA) on A-OKVQA

Metric: MC Accuracy (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕MC Accuracy▼Extra DataPaperDate↕Code
1SMoLA-PaLI-X Specialist Model83.75YesOmni-SMoLA: Boosting Generalist Multimodal Model...2023-12-01-
2PaLI-X-VPD80.4NoVisual Program Distillation: Distilling Tools an...2023-12-05-
3Prophet75.1NoProphet: Prompting Large Language Models with Co...2023-03-03Code
4PromptCap73.2NoPromptCap: Prompt-Guided Task-Aware Image Captio...2022-11-15Code
5MC-CoT71NoBoosting the Power of Small Multimodal Reasoning...2023-11-23Code
6HYDRA56.35NoHYDRA: A Hyper Agent for Dynamic Compositional V...2024-03-19Code
7GPV-253.7NoWebly Supervised Concept Expansion for General P...2022-02-04-
8KRISP42.2NoKRISP: Integrating Implicit and Symbolic Knowled...2020-12-20-
9ViLBERT - VQA42.1NoViLBERT: Pretraining Task-Agnostic Visiolinguist...2019-08-06Code
10LXMERT41.6NoLXMERT: Learning Cross-Modality Encoder Represen...2019-08-20Code
11ViLBERT41.5NoViLBERT: Pretraining Task-Agnostic Visiolinguist...2019-08-06Code
12Pythia40.1NoPythia v0.1: the Winning Entry to the VQA Challe...2018-07-26Code
13ViLBERT - OK-VQA34.1NoViLBERT: Pretraining Task-Agnostic Visiolinguist...2019-08-06Code