Image features from bottom-up attention (adaptive K, ensemble)

Reported on 2 benchmarks across 1 task · 1 paper · 1 SOTA

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Natural Language Processing2 results

Visual Question Answering (VQA)onVQA v2 test-dev
Accuracy· 2017-08-09
69.87
best: 84.3 (PaLI)
SOTA
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge arXiv:1708.02711
Visual Question Answering (VQA)onVQA v2 test-std
overall· 2017-08-09
70.3
best: 84.03 (BEiT-3)
Tips and Tricks for Visual Question Answering: Learnings from the 2017 Challenge arXiv:1708.02711