ScreenSpot

ImagesTextsApache 2.0Introduced 2024-01-17

ScreenSpot Evaluation Benchmark

ScreenSpot is an evaluation benchmark for GUI grounding, comprising over 1,200 instructions from various environments, including iOS, Android, macOS, Windows, and Web. Each data point includes annotated element types (Text or Icon/Widget). For more details and examples, please refer to our paper.

Test Sample Details

Each test sample includes:

  • img_filename: The interface screenshot file.
  • instruction: Human-provided instruction.
  • bbox: The bounding box of the target element corresponding to the instruction.
  • data_type: The type of the target element, either "icon" or "text".
  • data_source: The interface platform, which could be iOS, Android, macOS, Windows, or Web (e.g., GitLab, Shop, Forum, Tool).