Patching

Activation Patching

Natural Language ProcessingIntroduced 2000102 papers

Description

Activation patching studies the model's computation by altering its latent representations, the token embeddings in transformer-based language models, during the inference process

Papers Using This Method

Data Augmentation in Time Series Forecasting through Inverted Framework2025-07-15 Adversarial Activation Patching: A Framework for Detecting and Mitigating Emergent Deception in Safety-Aligned Transformers2025-07-12 Unpatchable Vulnerabilities in Windows 10/11: Security Report 20252025-07-10 Q-resafe: Assessing Safety Risks and Quantization-aware Safety Patching for Quantized Large Language Models2025-06-25 SEC-bench: Automated Benchmarking of LLM Agents on Real-World Software Security Tasks2025-06-13 Time Series Representations for Classification Lie Hidden in Pretrained Vision Transformers2025-06-10 Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models2025-06-09 A Multi-Dataset Evaluation of Models for Automated Vulnerability Repair2025-06-05 Can LLMs Reason Abstractly Over Math Word Problems Without CoT? Disentangling Abstract Formulation From Arithmetic Computation2025-05-29 From What to How: Attributing CLIP's Latent Components Reveals Unexpected Semantic Reliance2025-05-26 Foundation Model for Wireless Technology Recognition Using IQ Timeseries2025-05-26 Co-PatcheR: Collaborative Software Patching with Component(s)-specific Small Reasoning Models2025-05-25 Know the Ropes: A Heuristic Strategy for LLM-based Multi-Agent System Design2025-05-22 Internal Chain-of-Thought: Empirical Evidence for Layer-wise Subtask Scheduling in LLMs2025-05-20 SPIRIT: Patching Speech Language Models against Jailbreak Attacks2025-05-18 SPAT: Sensitivity-based Multihead-attention Pruning on Time Series Forecasting Models2025-05-13 Unpacking Robustness in Inflectional Languages: Adversarial Evaluation and Mechanistic Insights2025-05-08 Interpreting Multilingual and Document-Length Sensitive Relevance Computations in Neural Retrieval Models through Axiomatic Causal Interventions2025-05-04 The Illusion of Role Separation: Hidden Shortcuts in LLM Role Learning (and How to Fix Them)2025-05-01 HSE: A plug-and-play module for unified fault diagnosis foundation models2025-04-26