TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Datasets/ToT

ToT

Test of Time

Introduced 2024-06-13

ToT is a benchmark for evaluating LLMs on temporal reasoning.

ToT is a dataset designed to assess the temporal reasoning capabilities of AI models. It comprises two key sections:

ToT-semantic: Measuring the semantics and logic of time understanding. ToT-arithmetic: Measuring the ability to carry out time arithmetic operations.

Data Format The ToT-semantic and ToT-semantic-large datasets contain the following fields:

question: Contains the text of the question. graph_gen_algorithm: Contains the name of the graph generator algorithm used to generate the graph. question_type: Corresponds to one of the 7 question types in the dataset. sorting_type: Correspons to the sorting type applied on the facts to order them. prompt: Contains the full prompt text used to evaluate LLMs on the task. label: Contains the ground truth answer to the question. The ToT-arithmetic dataset contains the following fields:

question: Contains the text of the question. question_type: Corresponds to one of the 7 question types in the dataset. label: Contains the ground truth answer to the question.

Related Benchmarks

ToTTo/Data-to-Text Generation/BLEUToTTo/Data-to-Text Generation/METEORToTTo/Data-to-Text Generation/PARENTToTTo/Text Generation/BLEUToTTo/Text Generation/METEORToTTo/Text Generation/PARENTTotal Capture/1 Image, 2*2 Stitchi/Average MPJPE (mm)Total Capture/1 Image, 2*2 Stitchi/MPJPETotal Capture/3D/Average MPJPE (mm)Total Capture/3D/MPJPETotal Capture/3D Absolute Human Pose Estimation/MPJPETotal Capture/3D Human Pose Estimation/Average MPJPE (mm)Total Capture/3D Human Pose Estimation/MPJPETotal Capture/Pose Estimation/Average MPJPE (mm)Total Capture/Pose Estimation/MPJPETotal-Text/Scene Text Detection/F-MeasureTotal-Text/Scene Text Detection/FPSTotal-Text/Scene Text Detection/PrecisionTotal-Text/Scene Text Detection/RecallTotal-Text/Text Spotting/F-measure (%) - Full LexiconTotal-Text/Text Spotting/F-measure (%) - No Lexicon

Statistics

Papers
5
Benchmarks
0

Links

Homepage