InfiniteBench

∞Bench: Extending Long Context Evaluation Beyond 100K Tokens

TextsMIT licenseIntroduced 2024-02-21

Introduction

Welcome to InfiniteBench, a cutting-edge benchmark tailored for evaluating the capabilities of language models to process, understand, and reason over super long contexts (100k+ tokens). Long contexts are crucial for enhancing applications with LLMs and achieving high-level interaction. InfiniteBench is designed to push the boundaries of language models by testing them against a context length of 100k+, which is 10 times longer than traditional datasets.

Features

  • Loooong Context: InfiniteBench is a pioneer in testing language models with a context length of 100k+, offering an unparalleled challenge in the field.
  • Diverse Domain: The benchmark comprises 12 unique tasks, each crafted to assess different aspects of language processing and comprehension in extended contexts.
  • Specialized Test: InfiniteBench consists of tasks that state-of-the-art LLMs are known to be capable of when using shorter context. This ensures that the performance degradation is only caused by the length of the contexts.
  • Real-World and Synthetic Scenarios: The tasks are a mix of real-world scenarios and synthetic constructs, ensuring a comprehensive evaluation of models. Real-world scenarios make the test pragmatic, and synthetic ones leave the space for extending the context length further with ease.