Spider2-V
EnvironmentImagesInteractiveTextsApache-2.0 licenseIntroduced 2024-07-15
A multimodal agent benchmark on professional data science and engineering.
- 494 real-world tasks, ranging from data warehousing to orchestration;
- 20 professional enterprise-level applications (e.g., BigQuery, dbt, Airbyte, etc.);
- both command line (CLI) and graphical user interfaces (GUI);
- an interactive executable computer environment;
- a document warehouse for agent retrieval.