Simplified State Space Layers for Sequence Modeling

Jimmy T. H. Smith, Andrew Warrington, Scott W. Linderman

2022-08-09ListOps Long-range modeling Retrieval

Paper PDF Code(official)Code Code Code Code Code

Abstract

Models using structured state space sequence (S4) layers have achieved state-of-the-art performance on long-range sequence modeling tasks. An S4 layer combines linear state space models (SSMs), the HiPPO framework, and deep learning to achieve high performance. We build on the design of the S4 layer and introduce a new state space layer, the S5 layer. Whereas an S4 layer uses many independent single-input, single-output SSMs, the S5 layer uses one multi-input, multi-output SSM. We establish a connection between S5 and S4, and use this to develop the initialization and parameterization used by the S5 model. The result is a state space layer that can leverage efficient and widely implemented parallel scans, allowing S5 to match the computational efficiency of S4, while also achieving state-of-the-art performance on several long-range sequence modeling tasks. S5 averages 87.4% on the long range arena benchmark, and 98.5% on the most difficult Path-X task.

Results

Task	Dataset	Metric	Value	Model
Language Modelling	LRA	Avg	87.46	S5
Language Modelling	LRA	Image	88	S5
Language Modelling	LRA	ListOps	62.15	S5
Language Modelling	LRA	Pathfinder	95.33	S5
Language Modelling	LRA	Pathfinder-X	98.58	S5
Language Modelling	LRA	Retrieval	91.4	S5
Language Modelling	LRA	Text	89.31	S5

Related Papers

From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17 HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17 A Survey of Context Engineering for Large Language Models2025-07-17 MCoT-RE: Multi-Faceted Chain-of-Thought and Re-Ranking for Training-Free Zero-Shot Composed Image Retrieval2025-07-17 Developing Visual Augmented Q&A System using Scalable Vision Embedding Retrieval & Late Interaction Re-ranker2025-07-16 Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16 Context-Aware Search and Retrieval Over Erasure Channels2025-07-16 U-RWKV: Lightweight medical image segmentation with direction-adaptive RWKV2025-07-15