Attention Sinks

GeneralIntroduced 200013 papers

Description

Please enter a description about the method here

Papers Using This Method

MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation2025-05-27 Analysis of Attention in Video Diffusion Transformers2025-04-14 Why do LLMs attend to the first token?2025-04-03 Interpreting the Repeated Token Phenomenon in Large Language Models2025-03-11 Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings2025-02-02 Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads2025-01-25 Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models2024-12-21 Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs2024-11-15 Value Residual Learning For Alleviating Attention Concentration In Transformers2024-10-23 When Attention Sink Emerges in Language Models: An Empirical View2024-10-14 Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective2024-10-08 Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration2024-06-22 Efficient Streaming Language Models with Attention Sinks2023-09-29