Tasks SotA Datasets Papers Methods Submit About

Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable Benchmarks All SotA Datasets Papers Methods

Community

Submit Results About

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/Sparse Autoencoder

Sparse Autoencoder

Computer VisionIntroduced 200057 papers

Description

A Sparse Autoencoder is a type of autoencoder that employs sparsity to achieve an information bottleneck. Specifically the loss function is constructed so that activations are penalized within a layer. The sparsity constraint can be imposed with L1 regularization or a KL divergence between expected average neuron activation to an ideal distribution $p$ .

Image: Jeff Jordan. Read his blog post (click) for a detailed summary of autoencoders.

Papers Using This Method

Bridging Compositional and Distributional Semantics: A Survey on Latent Semantic Geometry via AutoEncoder2025-06-25 CWGAN-GP Augmented CAE for Jamming Detection in 5G-NR in Non-IID Datasets2025-06-18 Resa: Transparent Reasoning Models via SAEs2025-06-11 Model Unlearning via Sparse Autoencoder Subspace Guided Projections2025-05-30 SAE-FiRE: Enhancing Earnings Surprise Predictions Through Sparse Autoencoder Feature Selection2025-05-20 Are Sparse Autoencoders Useful for Java Function Bug Detection?2025-05-15 Interpretable Risk Mitigation in LLM Agent Systems2025-05-15 Beyond Input Activations: Identifying Influential Latents by Gradient Sparse Autoencoders2025-05-12 Decoding Futures Price Dynamics: A Regularized Sparse Autoencoder for Interpretable Multi-Horizon Forecasting and Factor Discovery2025-05-11 Geospatial Mechanistic Interpretability of Large Language Models2025-05-06 FineScope : Precision Pruning for Domain-Specialized Large Language Models Using SAE-Guided Self-Data Cultivation2025-05-01 Towards Understanding the Nature of Attention with Low-Rank Sparse Decomposition2025-04-29 Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video2025-04-28 A real-time anomaly detection method for robots based on a flexible and sparse latent space2025-04-15 Dissecting and Mitigating Diffusion Bias via Mechanistic Interpretability2025-03-26 Sparse Autoencoder as a Zero-Shot Classifier for Concept Erasing in Text-to-Image Diffusion Models2025-03-12 Route Sparse Autoencoder to Interpret Large Language Models2025-03-11 Self-Regularization with Latent Space Explanations for Controllable LLM-based Classification2025-02-19 LLM Pretraining with Continuous Concepts2025-02-12 Sparse Autoencoders for Hypothesis Generation2025-02-05