TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/StepNet: Spatial-temporal Part-aware Network for Isolated ...

StepNet: Spatial-temporal Part-aware Network for Isolated Sign Language Recognition

Xiaolong Shen, Zhedong Zheng, Yi Yang

2022-12-25Optical Flow EstimationSign Language Recognition
PaperPDF

Abstract

The goal of sign language recognition (SLR) is to help those who are hard of hearing or deaf overcome the communication barrier. Most existing approaches can be typically divided into two lines, i.e., Skeleton-based and RGB-based methods, but both the two lines of methods have their limitations. Skeleton-based methods do not consider facial expressions, while RGB-based approaches usually ignore the fine-grained hand structure. To overcome both limitations, we propose a new framework called Spatial-temporal Part-aware network~(StepNet), based on RGB parts. As its name suggests, it is made up of two modules: Part-level Spatial Modeling and Part-level Temporal Modeling. Part-level Spatial Modeling, in particular, automatically captures the appearance-based properties, such as hands and faces, in the feature space without the use of any keypoint-level annotations. On the other hand, Part-level Temporal Modeling implicitly mines the long-short term context to capture the relevant attributes over time. Extensive experiments demonstrate that our StepNet, thanks to spatial-temporal modules, achieves competitive Top-1 Per-instance accuracy on three commonly-used SLR benchmarks, i.e., 56.89% on WLASL, 77.2% on NMFs-CSL, and 77.1% on BOBSL. Additionally, the proposed method is compatible with the optical flow input and can produce superior performance if fused. For those who are hard of hearing, we hope that our work can act as a preliminary step.

Results

TaskDatasetMetricValueModel
Sign Language RecognitionWLASLTop-1 Accuracy61.17StepNet
Sign Language RecognitionWLASL100Top-1 Accuracy78.29StepNet
Sign Language RecognitionBOBSLActions Top-177.1StepNet
Sign Language RecognitionWLASL-2000Top-1 Accuracy61.17StepNet

Related Papers

Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17An Efficient Approach for Muscle Segmentation and 3D Reconstruction Using Keypoint Tracking in MRI Scan2025-07-11Learning to Track Any Points from Human Motion2025-07-08TLB-VFI: Temporal-Aware Latent Brownian Bridge Diffusion for Video Frame Interpolation2025-07-07MEMFOF: High-Resolution Training for Memory-Efficient Multi-Frame Optical Flow Estimation2025-06-29EndoFlow-SLAM: Real-Time Endoscopic SLAM with Flow-Constrained Gaussian Splatting2025-06-26WAFT: Warping-Alone Field Transforms for Optical Flow2025-06-26Hierarchical Sub-action Tree for Continuous Sign Language Recognition2025-06-26