TIME

\textsc{TimE}: A Multi-level Benchmark for Temporal Reasoning of LLMs in Real-World Scenario

TextsCC BYIntroduced 2025-05-19

Click to add a brief description of the dataset (Markdown and LaTeX enabled).

Provide:

a high-level explanation of the dataset characteristics
explain motivations and summary of its content
potential use cases of the dataset

Related Benchmarks

Time Series Prediction Benchmarks/2D Semantic Segmentation/1:3 Accuracy TimeBank/Information Extraction/F1 score TimeBank/Temporal Information Extraction/F1 score TimeBank/Temporal Processing/F1 score TimeBank/Temporal Processing/F1-Score TimeBankPT/Information Extraction/F1 TimeBankPT/Temporal Information Extraction/F1 TimeBankPT/Temporal Processing/F1 TimeQuestions/Question Answering/P@1 TimeTravel/Music Auto-Tagging/0..5sec Timers and Such/Dialogue/Accuracy (%)Timers and Such/Dialogue Understanding/Accuracy (%)Timers and Such/Spoken Language Understanding/Accuracy (%)