Spider2-V

EnvironmentImagesInteractiveTextsApache-2.0 licenseIntroduced 2024-07-15

A multimodal agent benchmark on professional data science and engineering.

  • 494 real-world tasks, ranging from data warehousing to orchestration;
  • 20 professional enterprise-level applications (e.g., BigQuery, dbt, Airbyte, etc.);
  • both command line (CLI) and graphical user interfaces (GUI);
  • an interactive executable computer environment;
  • a document warehouse for agent retrieval.