Description
Please enter a description about the method here
Papers Using This Method
MARS-Bench: A Multi-turn Athletic Real-world Scenario Benchmark for Dialogue Evaluation2025-05-27Analysis of Attention in Video Diffusion Transformers2025-04-14Why do LLMs attend to the first token?2025-04-03Interpreting the Repeated Token Phenomenon in Large Language Models2025-03-11Attention Sinks and Outlier Features: A 'Catch, Tag, and Release' Mechanism for Embeddings2025-02-02Task-KV: Task-aware KV Cache Optimization via Semantic Differentiation of Attention Heads2025-01-25Attention Entropy is a Key Factor: An Analysis of Parallel Context Encoding with Full-attention-based Pre-trained Language Models2024-12-21Seeing Clearly by Layer Two: Enhancing Attention Heads to Alleviate Hallucination in LVLMs2024-11-15Value Residual Learning For Alleviating Attention Concentration In Transformers2024-10-23When Attention Sink Emerges in Language Models: An Empirical View2024-10-14Does RoBERTa Perform Better than BERT in Continual Learning: An Attention Sink Perspective2024-10-08Unveiling and Harnessing Hidden Attention Sinks: Enhancing Large Language Models without Training through Attention Calibration2024-06-22Efficient Streaming Language Models with Attention Sinks2023-09-29