TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

SotA/Natural Language Processing/Visual Question Answering (VQA)/InfiMM-Eval

Visual Question Answering (VQA) on InfiMM-Eval

Metric: Analogical (higher is better)

LeaderboardDataset
Loading chart...

Results

Submit a result
#Model↕Analogical▼Extra DataPaperDate↕Code
1GPT-4V69.86NoGPT-4 Technical Report2023-03-15Code
2Qwen-VL-Chat30.42NoQwen-VL: A Versatile Vision-Language Model for U...2023-08-24Code
3CogVLM-Chat28.75NoCogVLM: Visual Expert for Pretrained Language Mo...2023-11-06Code
4LLaVA-1.524.31NoImproved Baselines with Visual Instruction Tuning2023-10-05Code
5LLaMA-Adapter V2 22.08NoLLaMA-Adapter V2: Parameter-Efficient Visual Ins...2023-04-28Code
6SPHINX v220.69NoSPHINX: The Joint Mixing of Weights, Tasks, and ...2023-11-13Code
7InstructBLIP20.56NoInstructBLIP: Towards General-purpose Vision-Lan...2023-05-11Code
8InternLM-XComposer-VL18.61NoInternLM-XComposer: A Vision-Language Large Mode...2023-09-26Code
9Emu18.19NoEmu: Generative Pretraining in Multimodality2023-07-11Code
10Otter13.33NoOtter: A Multi-Modal Model with In-Context Instr...2023-05-05Code
11mPLUG-Owl27.64NomPLUG-Owl2: Revolutionizing Multi-modal Large La...2023-11-07Code
12BLIP-2-OPT2.7B7.5NoBLIP-2: Bootstrapping Language-Image Pre-trainin...2023-01-30Code
13MiniGPT-v25.69NoMiniGPT-4: Enhancing Vision-Language Understandi...2023-04-20Code
14OpenFlamingo-v21.11NoOpenFlamingo: An Open-Source Framework for Train...2023-08-02Code