TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Methods/STA-LSTM

STA-LSTM

Spatio-Temporal Attention LSTM

GeneralIntroduced 20002 papers
Source Paper

Description

In human action recognition, each type of action generally only depends on a few specific kinematic joints. Furthermore, over time, multiple actions may be performed. Motivated by these observations, Song et al. proposed a joint spatial and temporal attention network based on LSTM, to adaptively find discriminative features and keyframes. Its main attention-related components are a spatial attention sub-network, to select important regions, and a temporal attention sub-network, to select key frames. The spatial attention sub-network can be written as: \begin{align} s_{t} &= U_{s}\tanh(W_{xs}X_{t} + W_{hs}h_{t-1}^{s} + b_{si}) + b_{so} \end{align} \begin{align} \alpha_{t} &= \text{Softmax}(s_{t}) \end{align} \begin{align} Y_{t} &= \alpha_{t} X_{t} \end{align} where XtX_{t}Xt​ is the input feature at time ttt, UsU_{s}Us​, WhsW_{hs}Whs​, bsib_{si}bsi​, and bsob_{so}bso​ are learnable parameters, and ht−1sh_{t-1}^{s}ht−1s​ is the hidden state at step t−1t-1t−1. Note that use of the hidden state hhh means the attention process takes temporal relationships into consideration.

The temporal attention sub-network is similar to the spatial branch and produces its attention map using: \begin{align} \beta_{t} = \delta(W_{xp}X_{t} + W_{hp}h_{t-1}^{p} + b_{p}). \end{align} It adopts a ReLU function instead of a normalization function for ease of optimization. It also uses a regularized objective function to improve convergence.

Overall, this paper presents a joint spatiotemporal attention method to focus on important joints and keyframes, with excellent results on the action recognition task.

Papers Using This Method

Exploring Transformer-Augmented LSTM for Temporal and Spatial Feature Learning in Trajectory Prediction2024-12-18An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data2016-11-18