Papers With Code 2 | ML Benchmarks, SotA Results & Code

The Needle in a Needlestack (NIAN) is a new benchmark designed to measure how well Language Learning Models (LLMs) pay attention to the information in their context window¹.

In this benchmark, a prompt is created that includes thousands of limericks, and the prompt asks a question about one limerick at a specific location¹. The goal is to test the ability of LLMs to locate and understand specific information within a large context.

For example, a prompt might include around 2500 limericks, and the LLM is asked a question about a limerick at a certain position¹. The LLM's task is to correctly answer the question by paying attention to the right part of the context.

(1) GPT-4o’s Memory Breakthrough! (NIAN code) | needle-in-a-needlestack. http://nian.llmonpy.ai/. (2) [P] Needle in a Needlestack (NIAN) | allainews.com. https://allainews.com/item/p-needle-in-a-needlestack-nian-2024-05-16/. (3) GitHub - llmonpy/needle-in-a-needlestack. https://github.com/llmonpy/needle-in-a-needlestack/.