GRIT
General Robust Image Task Benchmark
ImagesTextsApache License 2.0Introduced 2022-04-28
The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:
- General purpose vision models - GRIT facilitates the evaluation of unified and general-purpose vision models that demonstrate a wide range of skills across a diverse set of concepts.
- Robust specialized models - GRIT simplifies and unifies quantification of misinformation, calibration, and generalization under distribution shifts due to novel concepts, novel data sources or image distortions for 7 standard vision and vision-language tasks.
- Efficient learning - GRIT includes a
restrictedand anunrestrictedtrack. Therestrictedtrack constrains the allowed training data to a selected but rich set of data sources that allows more scientific and meaningful comparison between models. This is meant to encourage resource constrained researchers to participate in the GRIT challenge and to spur interest in efficient learning methods as opposed to the dominant paradigm of training larger models on ever increasing amounts of training data. Theunrestrictedtrack allows much more flexibility in training data selection to test the capability of vision models trained with massive data and compute.
Benchmarks
Object Categorization/Categorization (ablation)Object Categorization/Categorization (test)Object Localization/Localization (ablation)Object Localization/Localization (test)Object Segmentation/Segmentation (ablation)Object Segmentation/Segmentation (test)Visual Question Answering/VQA (ablation)Visual Question Answering (VQA)/VQA (ablation)Visual Question Answering (VQA)/VQA (test)