Papers With Code 2 | ML Benchmarks, SotA Results & Code

The General Robust Image Task (GRIT) Benchmark is an evaluation-only benchmark for evaluating the performance and robustness of vision systems across multiple image prediction tasks, concepts, and data sources. GRIT hopes to encourage our research community to pursue the following research directions:

General purpose vision models - GRIT facilitates the evaluation of unified and general-purpose vision models that demonstrate a wide range of skills across a diverse set of concepts.
Robust specialized models - GRIT simplifies and unifies quantification of misinformation, calibration, and generalization under distribution shifts due to novel concepts, novel data sources or image distortions for 7 standard vision and vision-language tasks.
Efficient learning - GRIT includes a restricted and an unrestricted track. The restricted track constrains the allowed training data to a selected but rich set of data sources that allows more scientific and meaningful comparison between models. This is meant to encourage resource constrained researchers to participate in the GRIT challenge and to spur interest in efficient learning methods as opposed to the dominant paradigm of training larger models on ever increasing amounts of training data. The unrestricted track allows much more flexibility in training data selection to test the capability of vision models trained with massive data and compute.