Description
Activation patching studies the model's computation by altering its latent representations, the token embeddings in transformer-based language models, during the inference process
Papers Using This Method
Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers2025-07-12Unpatchable Vulnerabilities in Windows 10/11: Security Report 20252025-07-10Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models2025-06-25SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks2025-06-13Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers2025-06-10Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models2025-06-09A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair2025-06-05Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation2025-05-29From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance2025-05-26Foundation Model for Wireless Technology Recognition Using IQ Timeseries2025-05-26Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models2025-05-25Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design2025-05-22Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs2025-05-20SPIRIT: Patching Speech Language Models against Jailbreak Attacks2025-05-18SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models2025-05-13Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights2025-05-08Interpreting Multilingual and Document-Length Sensitive Relevance Computations in Neural Retrieval Models through Axiomatic Causal Interventions2025-05-04The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)2025-05-01HSE: A plug-and-play module for unified fault diagnosis foundation models2025-04-26