TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Gamified crowd-sourcing of high-quality data for visual fi...

Gamified crowd-sourcing of high-quality data for visual fine-tuning

Shashank Yadav, Rohan Tomar, Garvit Jain, Chirag Ahooja, Shubham Chaudhary, Charles Elkan

2024-10-05Visual Question Answering
PaperPDF

Abstract

This paper introduces Gamified Adversarial Prompting (GAP), a framework that crowd-sources high-quality data for visual instruction tuning of large multimodal models. GAP transforms the data collection process into an engaging game, incentivizing players to provide fine-grained, challenging questions and answers that target gaps in the model's knowledge. Our contributions include (1) an approach to capture question-answer pairs from humans that directly address weaknesses in a model's knowledge, (2) a method for evaluating and rewarding players that successfully incentivizes them to provide high-quality submissions, and (3) a scalable, gamified platform that succeeds in collecting this data from over 50,000 participants in just a few weeks. Our implementation of GAP has significantly improved the accuracy of a small multimodal model, namely MiniCPM-Llama3-V-2.5-8B, increasing its GPT score from 0.147 to 0.477 on our dataset, approaching the benchmark set by the much larger GPT-4V. Moreover, we demonstrate that the data generated using MiniCPM-Llama3-V-2.5-8B also enhances its performance across other benchmarks, and exhibits cross-model benefits. Specifically, the same data improves the performance of QWEN2-VL-2B and QWEN2-VL-7B on the same multiple benchmarks.

Results

TaskDatasetMetricValueModel
Visual Question Answering (VQA)MM-VetGPT-4 score64.954Qwen2-VL-7B (finetuned on GAP-VQA train)
Visual Question Answering (VQA)MM-VetGPT-4 score52.43Qwen2-VL-2B (finetuned on GAP-VQA train)
Visual Question Answering (VQA)MM-VetGPT-4 score51.789MiniCPM-Llama3-V-2.5-8B (finetuned on GAP-VQA train)
Visual Question AnsweringMM-VetGPT-4 score64.954Qwen2-VL-7B (finetuned on GAP-VQA train)
Visual Question AnsweringMM-VetGPT-4 score52.43Qwen2-VL-2B (finetuned on GAP-VQA train)
Visual Question AnsweringMM-VetGPT-4 score51.789MiniCPM-Llama3-V-2.5-8B (finetuned on GAP-VQA train)

Related Papers

Describe Anything Model for Visual Question Answering on Text-rich Images2025-07-16Evaluating Attribute Confusion in Fashion Text-to-Image Generation2025-07-09LinguaMark: Do Multimodal Models Speak Fairly? A Benchmark-Based Evaluation2025-07-09Barriers in Integrating Medical Visual Question Answering into Radiology Workflows: A Scoping Review and Clinicians' Insights2025-07-09MagiC: Evaluating Multimodal Cognition Toward Grounded Visual Reasoning2025-07-09Enhancing Scientific Visual Question Answering through Multimodal Reasoning and Ensemble Modeling2025-07-08ReLoop: "Seeing Twice and Thinking Backwards" via Closed-loop Training to Mitigate Hallucinations in Multimodal understanding2025-07-07Revisiting CroPA: A Reproducibility Study and Enhancements for Cross-Prompt Adversarial Transferability in Vision-Language Models2025-06-28