TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Models/STEP-CATFormer

STEP-CATFormer

Reported on 48 benchmarks across 8 tasks · 1 paper

Note: results are matched by exact model name. Different papers may use the same name for different model variants.

Computer Vision18 results

  • VideoonNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • VideoonNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • VideoonNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • VideoonNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • VideoonNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • VideoonNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Temporal Action LocalizationonNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action LocalizationonNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288

Time Series12 results

  • Action DetectiononNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action DetectiononNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action DetectiononNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action DetectiononNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action DetectiononNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action DetectiononNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 96.7 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 95.6 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 97.4 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 99.6 (PoseC3D (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Action RecognitiononNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288

Methodology6 results

  • Zero-Shot LearningonNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Zero-Shot LearningonNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Zero-Shot LearningonNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Zero-Shot LearningonNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Zero-Shot LearningonNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Zero-Shot LearningonNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288

Robots6 results

  • Activity RecognitiononNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 96.7 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Activity RecognitiononNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 95.6 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Activity RecognitiononNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Activity RecognitiononNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 97.4 (DSCNet (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Activity RecognitiononNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 99.6 (PoseC3D (RGB + Pose))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • Activity RecognitiononNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288

Natural Language Processing6 results

  • 3D Action RecognitiononNTU RGB+D 120
    Accuracy (Cross-Setup)· uses extra data· 2023-12-06
    91.2
    best: 92.2 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • 3D Action RecognitiononNTU RGB+D 120
    Accuracy (Cross-Subject)· uses extra data· 2023-12-06
    90
    best: 90.9 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • 3D Action RecognitiononNTU RGB+D 120
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • 3D Action RecognitiononNTU RGB+D
    Accuracy (CS)· uses extra data· 2023-12-06
    93.2
    best: 94.3 (Hulk(Finetune, ViT-L))
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • 3D Action RecognitiononNTU RGB+D
    Accuracy (CV)· uses extra data· 2023-12-06
    97.3
    best: 98.3 (ST-GCN [PYSKL, 2D Skeleton])
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288
  • 3D Action RecognitiononNTU RGB+D
    Ensembled Modalities· uses extra data· 2023-12-06
    4
    best: 6 (ProtoGCN)
    STEP CATFormer: Spatial-Temporal Effective Body-Part Cross Attention Transformer for Skeleton-based Action RecognitionarXiv:2312.03288