Mamba

Mamba: Linear-Time Sequence Modeling with Selective State Spaces

Introduced 2000599 papers

Description

Foundation models, now powering most of the exciting applications in deep learning, are almost universally based on the Transformer architecture and its core attention module. Many subquadratic-time architectures such as linear attention, gated convolution and recurrent models, and structured state space models (SSMs) have been developed to address Transformers’ computational inefficiency on long sequences, but they have not performed as well as attention on important modalities such as language. We identify that a key weakness of such models is their inability to perform content-based reasoning, and make several improvements. First, simply letting the SSM parameters be functions of the input addresses their weakness with discrete modalities, allowing the model to selectively propagate or forget information along the sequence length dimension depending on the current token. Second, even though this change prevents the use of efficient convolutions, we design a hardware-aware parallel algorithm in recurrent mode. We integrate these selective SSMs into a simplified end-to-end neural network architecture without attention or even MLP blocks (Mamba). Mamba enjoys fast inference (5× higher throughput than Transformers) and linear scaling in sequence length, and its performance improves on real data up to million-length sequences. As a general sequence model backbone, Mamba achieves state-of-the-art performance across several modalities such as language, audio, and genomics. On language modeling, our Mamba-3B model outperforms Transformers of the same size and matches Transformers twice its size, both in pre-training and downstream evaluation.

Papers Using This Method

Differential Mamba2025-07-08 LangMamba: A Language-driven Mamba Framework for Low-dose CT Denoising with Vision-language Models2025-07-08 FindRec: Stein-Guided Entropic Flow for Multi-Modal Sequential Recommendation2025-07-07 MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection2025-07-06 MVNet: Hyperspectral Remote Sensing Image Classification Based on Hybrid Mamba-Transformer Vision Backbone Architecture2025-07-06 Mamba Guided Boundary Prior Matters: A New Perspective for Generalized Polyp Segmentation2025-07-02 MambAttention: Mamba with Multi-Head Attention for Generalizable Single-Channel Speech Enhancement2025-07-01 Mamba-FETrack V2: Revisiting State Space Model for Frame-Event based Visual Object Tracking2025-06-30 EAMamba: Efficient All-Around Vision State Space Model for Image Restoration2025-06-27 EAGLE: An Efficient Global Attention Lesion Segmentation Model for Hepatic Echinococcosis2025-06-25 FlightKooba: A Fast Interpretable FTP Model2025-06-24 JCAPT: A Joint Modeling Approach for CAPT2025-06-24 Memba: Membrane-driven Parameter-Efficient Fine-Tuning for Mamba2025-06-22 VMRA-MaR: An Asymmetry-Aware Temporal Framework for Longitudinal Breast Cancer Risk Prediction2025-06-20 EDNet: A Distortion-Agnostic Speech Enhancement Framework with Gating Mamba Mechanism and Phase Shift-Invariant Training2025-06-19 FADPNet: Frequency-Aware Dual-Path Network for Face Super-Resolution2025-06-17 Scaling Algorithm Distillation for Continuous Control with Mamba2025-06-16 Stereo sound event localization and detection based on PSELDnet pretraining and BiMamba sequence modeling2025-06-16 MT-PCR: A Hybrid Mamba-Transformer with Spatial Serialization for Hierarchical Point Cloud Registration2025-06-16 Sequential-Parallel Duality in Prefix Scannable Models2025-06-12