VTAB
Visual Task Adaptation Benchmark
The Visual Task Adaptation Benchmark (VTAB) is a benchmark designed to evaluate general visual representations². It consists of a diverse and challenging suite of tasks². The benchmark defines a good general visual representation as one that yields good performance on unseen tasks, when trained on limited task-specific data².
The VTAB benchmark contains the following 19 tasks that are derived from public datasets¹:
- Caltech101
- CIFAR-100
- CLEVR distance prediction
- CLEVR counting
- Diabetic Rethinopathy
- Dmlab Frames
- dSprites orientation prediction
- dSprites location prediction
- Describable Textures Dataset (DTD)
- EuroSAT
- KITTI distance prediction
- 102 Category Flower Dataset
- Oxford IIIT Pet dataset
- PatchCamelyon
- Resisc45
- Small NORB azimuth prediction
- Small NORB elevation prediction
- SUN397
- SVHN
The given model is independently fine-tuned for solving each of the above tasks¹. Average accuracy across all tasks is used to measure the model's performance¹. Detailed description of all tasks, evaluation protocol, and other details can be found in the VTAB paper¹.
(1) Visual Task Adaptation Benchmark. https://google-research.github.io/task_adaptation/. (2) GitHub - google-research/task_adaptation. https://github.com/google-research/task_adaptation. (3) GitHub - KMnP/vpt: ️ Visual Prompt Tuning [ECCV 2022] https://arxiv .... https://github.com/KMnP/vpt.