TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/InfiMM-Eval

Visual Question Answering (VQA) on InfiMM-Eval

Metric: Overall score (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Overall score▼Extra DataPaperDate↕Code
1GPT-4V74.44NoGPT-4 Technical Report2023-03-15Code
2SPHINX v239.48NoSPHINX: The Joint Mixing of Weights, Tasks, and ...2023-11-13Code
3Qwen-VL-Chat37.39NoQwen-VL: A Versatile Vision-Language Model for U...2023-08-24Code
4CogVLM-Chat37.16NoCogVLM: Visual Expert for Pretrained Language Mo...2023-11-06Code
5LLaVA-1.532.62NoImproved Baselines with Visual Instruction Tuning2023-10-05Code
6LLaMA-Adapter V2 30.46NoLLaMA-Adapter V2: Parameter-Efficient Visual Ins...2023-04-28Code
7Emu28.24NoEmu: Generative Pretraining in Multimodality2023-07-11Code
8InstructBLIP28.02NoInstructBLIP: Towards General-purpose Vision-Lan...2023-05-11Code
9InternLM-XComposer-VL26.84NoInternLM-XComposer: A Vision-Language Large Mode...2023-09-26Code
10Otter22.69NoOtter: A Multi-Modal Model with In-Context Instr...2023-05-05Code
11mPLUG-Owl220.05NomPLUG-Owl2: Revolutionizing Multi-modal Large La...2023-11-07Code
12BLIP-2-OPT2.7B19.31NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
13MiniGPT-v210.43NoMiniGPT-4: Enhancing Vision-Language Understandi...2023-04-20Code
14OpenFlamingo-v26.82NoOpenFlamingo: An Open-Source Framework for Train...2023-08-02Code