VTAB

Visual Task Adaptation Benchmark

Introduced 2019-09-25

The Visual Task Adaptation Benchmark (VTAB) is a benchmark designed to evaluate general visual representations². It consists of a diverse and challenging suite of tasks². The benchmark defines a good general visual representation as one that yields good performance on unseen tasks, when trained on limited task-specific data².

The VTAB benchmark contains the following 19 tasks that are derived from public datasets¹:

  • Caltech101
  • CIFAR-100
  • CLEVR distance prediction
  • CLEVR counting
  • Diabetic Rethinopathy
  • Dmlab Frames
  • dSprites orientation prediction
  • dSprites location prediction
  • Describable Textures Dataset (DTD)
  • EuroSAT
  • KITTI distance prediction
  • 102 Category Flower Dataset
  • Oxford IIIT Pet dataset
  • PatchCamelyon
  • Resisc45
  • Small NORB azimuth prediction
  • Small NORB elevation prediction
  • SUN397
  • SVHN

The given model is independently fine-tuned for solving each of the above tasks¹. Average accuracy across all tasks is used to measure the model's performance¹. Detailed description of all tasks, evaluation protocol, and other details can be found in the VTAB paper¹.

(1) Visual Task Adaptation Benchmark. https://google-research.github.io/task_adaptation/. (2) GitHub - google-research/task_adaptation. https://github.com/google-research/task_adaptation. (3) GitHub - KMnP/vpt: ️ Visual Prompt Tuning [ECCV 2022] https://arxiv .... https://github.com/KMnP/vpt.