TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/How to Train Your HiPPO: State Space Models with Generaliz...

How to Train Your HiPPO: State Space Models with Generalized Orthogonal Basis Projections

Albert Gu, Isys Johnson, Aman Timalsina, Atri Rudra, Christopher Ré

2022-06-24Long-range modeling
PaperPDFCode(official)

Abstract

Linear time-invariant state space models (SSM) are a classical model from engineering and statistics, that have recently been shown to be very promising in machine learning through the Structured State Space sequence model (S4). A core component of S4 involves initializing the SSM state matrix to a particular matrix called a HiPPO matrix, which was empirically important for S4's ability to handle long sequences. However, the specific matrix that S4 uses was actually derived in previous work for a particular time-varying dynamical system, and the use of this matrix as a time-invariant SSM had no known mathematical interpretation. Consequently, the theoretical mechanism by which S4 models long-range dependencies actually remains unexplained. We derive a more general and intuitive formulation of the HiPPO framework, which provides a simple mathematical interpretation of S4 as a decomposition onto exponentially-warped Legendre polynomials, explaining its ability to capture long dependencies. Our generalization introduces a theoretically rich class of SSMs that also lets us derive more intuitive S4 variants for other bases such as the Fourier basis, and explains other aspects of training S4, such as how to initialize the important timescale parameter. These insights improve S4's performance to 86% on the Long Range Arena benchmark, with 96% on the most difficult Path-X task.

Results

TaskDatasetMetricValueModel
Language ModellingLRAAvg86.09S4
Language ModellingLRAImage88.65S4
Language ModellingLRAListOps59.6S4
Language ModellingLRAPathfinder94.2S4
Language ModellingLRAPathfinder-X96.35S4
Language ModellingLRARetrieval90.9S4
Language ModellingLRAText86.82S4

Related Papers

U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15LaCache: Ladder-Shaped KV Caching for Efficient Long-Context Modeling of Large Language Models2025-07-14MambaFusion: Height-Fidelity Dense Global Fusion for Multi-modal 3D Object Detection2025-07-06Med-URWKV: Pure RWKV With ImageNet Pre-training For Medical Image Segmentation2025-06-12M2Restore: Mixture-of-Experts-based Mamba-CNN Fusion Framework for All-in-One Image Restoration2025-06-09CDPDNet: Integrating Text Guidance with Hybrid Vision Encoders for Medical Image Segmentation2025-05-25JanusDNA: A Powerful Bi-directional Hybrid DNA Foundation Model2025-05-22Hybrid-Emba3D: Geometry-Aware and Cross-Path Feature Hybrid Enhanced State Space Model for Point Cloud Classification2025-05-16