CURIE
CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning
The data is organized into eight domain-specific subfolders: "biogr", "dft", "pdb", "geo", "mpve", "qecc_65", "hfd", and "hfe". Each subfolder contains two further subfolders: "ground_truth" and "inputs". Within these, each data instance is stored in a JSON file named record_id.json, where record_id is a unique identifier. The "biogr" domain also includes image inputs as record_id.png files alongside the corresponding JSON.
data
├── domain
├── inputs
│ └── record_id.json
└── ground_truth
└── record_id.json
└── difficulty_levels.json
Ground truth data varies in structure and content across domains, but all files consistently include a record_id field matching the filename. Input files have a uniform structure across all domains, containing both a record_id field and a text field representing the input text to LLMs.
For the "biogr" (geo-referencing) task, for 114 of the 138 examples, we release additional data including the PDF papers that each image was taken from along with other metadata in this Github repo: https://github.com/google-research/ecology-georeferencing