TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/A Training-Free Length Extrapolation Approach for LLMs: Gr...

A Training-Free Length Extrapolation Approach for LLMs: Greedy Attention Logit Interpolation (GALI)

Yan Li, Tianyi Zhang, Zechuan Li, Soyeon Caren Han

2025-02-04Long-Context Understanding
PaperPDFCode(official)

Abstract

Transformer-based Large Language Models (LLMs) struggle to process inputs exceeding their training context window, with performance degrading due to positional out-of-distribution (O.O.D.) that disrupt attention computations. Existing solutions, fine-tuning and training-free methods, are limited by computational inefficiency, attention logit outliers or loss of local positional information. To address this, we propose Greedy Attention Logit Interpolation (GALI), a training-free length extrapolation method that maximizes the utilization of pretrained positional intervals while avoiding attention logit outliers through attention logit interpolation. The result demonstrates that GALI consistently outperforms state-of-the-art training-free methods. Our findings reveal that LLMs interpret positional intervals unevenly within their training context window, suggesting that extrapolating within a smaller positional interval range yields superior results-even for short-context tasks. GALI represents a significant step toward resolving the positional O.O.D. challenge, enabling more reliable long-text understanding in LLMs. Our implementation of GALI, along with the experiments from our paper, is open-sourced at https://github.com/AcademyCityL/GALI.

Results

TaskDatasetMetricValueModel
Long-Context UnderstandingLongBenchAverage Score46.22GALI(Llama3-8b-ins-4k-to-16k)
Long-Context UnderstandingLongBenchAverage Score45.38GALI(Llama3-8b-ins-8k-to-32k)
Long-Context UnderstandingLongBenchAverage Score45.17GALI(Llama3-8b-ins-8k-to-16k)
Long-Context UnderstandingL-EvalAverage Score59.21GALI(Llama3-8b-ins-4k-to-16k)
Long-Context UnderstandingL-EvalAverage Score59.1GALI(Llama3-8b-ins-4k-to-32k)
Long-Context UnderstandingL-EvalAverage Score42.79GALI(Llama3-8b-ins-8k-to-32k)
Long-Context UnderstandingL-EvalAverage Score42.32GALI(Llama3-8b-ins-8k-to-16k)

Related Papers

Ref-Long: Benchmarking the Long-context Referencing Capability of Long-context Language Models2025-07-13Cache Me If You Can: How Many KVs Do You Need for Effective Long-Context LMs?2025-06-20PaceLLM: Brain-Inspired Large Language Models for Long-Context Understanding2025-06-18DAM: Dynamic Attention Mask for Long-Context Large Language Model Inference Acceleration2025-06-06MesaNet: Sequence Modeling by Locally Optimal Test-Time Training2025-06-05ATLAS: Learning to Optimally Memorize the Context at Test Time2025-05-29SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences2025-05-27MiniLongBench: The Low-cost Long Context Understanding Benchmark for Large Language Models2025-05-26