TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Mamba-360: Survey of State Space Models as Transformer Alt...

Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges

Badri Narayana Patro, Vijay Srinivas Agneeswaran

2024-04-24Speech RecognitionMachine TranslationMusic Generationspeech-recognitionnamed-entity-recognitionTime Series ForecastingNamed Entity RecognitionNERTime SeriesRecommendation SystemsTime Series AnalysisNamed Entity Recognition (NER)
PaperPDFCode(official)

Abstract

Sequence modeling is a crucial area across various domains, including Natural Language Processing (NLP), speech recognition, time series forecasting, music generation, and bioinformatics. Recurrent Neural Networks (RNNs) and Long Short Term Memory Networks (LSTMs) have historically dominated sequence modeling tasks like Machine Translation, Named Entity Recognition (NER), etc. However, the advancement of transformers has led to a shift in this paradigm, given their superior performance. Yet, transformers suffer from $O(N^2)$ attention complexity and challenges in handling inductive bias. Several variations have been proposed to address these issues which use spectral networks or convolutions and have performed well on a range of tasks. However, they still have difficulty in dealing with long sequences. State Space Models(SSMs) have emerged as promising alternatives for sequence modeling paradigms in this context, especially with the advent of S4 and its variants, such as S4nd, Hippo, Hyena, Diagnol State Spaces (DSS), Gated State Spaces (GSS), Linear Recurrent Unit (LRU), Liquid-S4, Mamba, etc. In this survey, we categorize the foundational SSMs based on three paradigms namely, Gating architectures, Structural architectures, and Recurrent architectures. This survey also highlights diverse applications of SSMs across domains such as vision, video, audio, speech, language (especially long sequence modeling), medical (including genomics), chemical (like drug design), recommendation systems, and time series analysis, including tabular data. Moreover, we consolidate the performance of SSMs on benchmark datasets like Long Range Arena (LRA), WikiText, Glue, Pile, ImageNet, Kinetics-400, sstv2, as well as video datasets such as Breakfast, COIN, LVU, and various time series datasets. The project page for Mamba-360 work is available on this webpage.\url{https://github.com/badripatro/mamba360}.

Results

TaskDatasetMetricValueModel
Time Series ForecastingETTh1 (336) MultivariateMAE0.443Simba
Time Series ForecastingETTh1 (336) MultivariateMSE0.473Simba
Time Series AnalysisETTh1 (336) MultivariateMAE0.443Simba
Time Series AnalysisETTh1 (336) MultivariateMSE0.473Simba

Related Papers

IP2: Entity-Guided Interest Probing for Personalized News Recommendation2025-07-18A Reproducibility Study of Product-side Fairness in Bundle Recommendation2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17NonverbalTTS: A Public English Corpus of Text-Aligned Nonverbal Vocalizations with Emotion Annotations for Text-to-Speech2025-07-17The Power of Architecture: Deep Dive into Transformer Architectures for Long-Term Time Series Forecasting2025-07-17MoTM: Towards a Foundation Model for Time Series Imputation based on Continuous Modeling2025-07-17SGCL: Unifying Self-Supervised and Supervised Learning for Graph Recommendation2025-07-17Emergence of Functionally Differentiated Structures via Mutual Information Optimization in Recurrent Neural Networks2025-07-17