TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/ConvLSTM

ConvLSTM

SequentialIntroduced 2000145 papers
Source Paper

Description

ConvLSTM is a type of recurrent neural network for spatio-temporal prediction that has convolutional structures in both the input-to-state and state-to-state transitions. The ConvLSTM determines the future state of a certain cell in the grid by the inputs and past states of its local neighbors. This can easily be achieved by using a convolution operator in the state-to-state and input-to-state transitions (see Figure). The key equations of ConvLSTM are shown below, where ∗∗∗ denotes the convolution operator and ⊙\odot⊙ the Hadamard product:

i_t=σ(W_xi∗X_t+W_hi∗H_t−1+W_ci⊙C_t−1+b_i)i\_{t} = \sigma\left(W\_{xi} ∗ X\_{t} + W\_{hi} ∗ H\_{t−1} + W\_{ci} \odot \mathcal{C}\_{t−1} + b\_{i}\right)i_t=σ(W_xi∗X_t+W_hi∗H_t−1+W_ci⊙C_t−1+b_i)

f_t=σ(W_xf∗X_t+W_hf∗H_t−1+W_cf⊙C_t−1+b_f)f\_{t} = \sigma\left(W\_{xf} ∗ X\_{t} + W\_{hf} ∗ H\_{t−1} + W\_{cf} \odot \mathcal{C}\_{t−1} + b\_{f}\right)f_t=σ(W_xf∗X_t+W_hf∗H_t−1+W_cf⊙C_t−1+b_f)

C_t=f_t⊙C_t−1+i_t⊙tanh(W_xc∗X_t+W_hc∗H_t−1+b_c)\mathcal{C}\_{t} = f\_{t} \odot \mathcal{C}\_{t−1} + i\_{t} \odot \text{tanh}\left(W\_{xc} ∗ X\_{t} + W\_{hc} ∗ \mathcal{H}\_{t−1} + b\_{c}\right)C_t=f_t⊙C_t−1+i_t⊙tanh(W_xc∗X_t+W_hc∗H_t−1+b_c)

o_t=σ(W_xo∗X_t+W_ho∗H_t−1+W_co⊙C_t+b_o)o\_{t} = \sigma\left(W\_{xo} ∗ X\_{t} + W\_{ho} ∗ \mathcal{H}\_{t−1} + W\_{co} \odot \mathcal{C}\_{t} + b\_{o}\right)o_t=σ(W_xo∗X_t+W_ho∗H_t−1+W_co⊙C_t+b_o)

H_t=o_t⊙tanh(C_t)\mathcal{H}\_{t} = o\_{t} \odot \text{tanh}\left(C\_{t}\right)H_t=o_t⊙tanh(C_t)

If we view the states as the hidden representations of moving objects, a ConvLSTM with a larger transitional kernel should be able to capture faster motions while one with a smaller kernel can capture slower motions.

To ensure that the states have the same number of rows and same number of columns as the inputs, padding is needed before applying the convolution operation. Here, padding of the hidden states on the boundary points can be viewed as using the state of the outside world for calculation. Usually, before the first input comes, we initialize all the states of the LSTM to zero which corresponds to "total ignorance" of the future.

Papers Using This Method

FINN-GL: Generalized Mixed-Precision Extensions for FPGA-Accelerated LSTMs2025-06-25ReCoGNet: Recurrent Context-Guided Network for 3D MRI Prostate Segmentation2025-06-24Residual Connection-Enhanced ConvLSTM for Lithium Dendrite Growth Prediction2025-06-21Deep Learning Weather Models for Subregional Ocean Forecasting: A Case Study on the Canary Current Upwelling System2025-05-30Convolutional Long Short-Term Memory Neural Networks Based Numerical Simulation of Flow Field2025-05-21CTP: A hybrid CNN-Transformer-PINN model for ocean front forecasting2025-05-16Domain Knowledge Integrated CNN-xLSTM-xAtt Network with Multi Stream Feature Fusion for Cuffless Blood Pressure Estimation from Photoplethysmography Signals2025-05-13Global Climate Model Bias Correction Using Deep Learning2025-04-27How to systematically develop an effective AI-based bias correction model?2025-04-21Advancing Video Anomaly Detection: A Bi-Directional Hybrid Framework for Enhanced Single- and Multi-Task Approaches2025-04-20Exploring FMCW Radars and Feature Maps for Activity Recognition: A Benchmark Study2025-03-07Regional climate projections using a deep-learning-based model-ranking and downscaling framework: Application to European climate zones2025-02-27IoT-Based Real-Time Medical-Related Human Activity Recognition Using Skeletons and Multi-Stage Deep Learning for Healthcare2025-01-13TopoFormer: Integrating Transformers and ConvLSTMs for Coastal Topography Prediction2025-01-11An End-to-End Two-Stream Network Based on RGB Flow and Representation Flow for Human Action Recognition2024-11-27Video to Video Generative Adversarial Network for Few-shot Learning Based on Policy Gradient2024-10-28Video Prediction Transformers without Recurrence or Convolution2024-10-07Dynamical system prediction from sparse observations using deep neural networks with Voronoi tessellation and physics constraint2024-08-31FATE: Focal-modulated Attention Encoder for Temperature Prediction2024-08-21An Improved CovidConvLSTM model for pneumonia-COVID-19 detection and classification2024-08-21