CURIE

CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning

Introduced 2025-03-14

The data is organized into eight domain-specific subfolders: "biogr", "dft", "pdb", "geo", "mpve", "qecc_65", "hfd", and "hfe". Each subfolder contains two further subfolders: "ground_truth" and "inputs". Within these, each data instance is stored in a JSON file named record_id.json, where record_id is a unique identifier. The "biogr" domain also includes image inputs as record_id.png files alongside the corresponding JSON.

data
    ├── domain
        ├── inputs
        │   └── record_id.json
        └── ground_truth
            └── record_id.json
    └── difficulty_levels.json

Ground truth data varies in structure and content across domains, but all files consistently include a record_id field matching the filename. Input files have a uniform structure across all domains, containing both a record_id field and a text field representing the input text to LLMs.

For the "biogr" (geo-referencing) task, for 114 of the 138 examples, we release additional data including the PDF papers that each image was taken from along with other metadata in this Github repo: https://github.com/google-research/ecology-georeferencing