Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
Diving-48
Action Recognition on Diving-48
Metric: Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Accuracy (best first)
Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
LVMAE
94.9
Yes
Extending Video Masked Autoencoders to 128 frames
2024-11-20
-
2
Video-FocalNet-B
90.8
No
Video-FocalNets: Spatio-Temporal Focal Modulatio...
2023-07-13
Code
3
AIM (CLIP ViT-L/14, 32x224)
90.6
Yes
AIM: Adapting Image Models for Efficient Video A...
2023-02-06
Code
4
DUALPATH
88.7
No
Dual-path Adaptation from Image to Video Transfo...
2023-03-17
Code
5
TFCNet
88.3
No
TFCNet: Temporal Fully Connected Networks for St...
2022-03-11
-
6
StructVit-B-4-1
88.3
No
Learning Correlation Structures for Vision Trans...
2024-04-05
-
7
ORViT TimeSformer
88
No
Object-Region Video Transformers
2021-10-13
Code
8
GC-TDN
87.6
No
Group Contextualization for Video Recognition
2022-03-18
Code
9
BEVT
86.7
No
BEVT: BERT Pretraining of Video Transformers
2021-12-02
Code
10
PSB
86
No
Spatiotemporal Self-attention Modeling with Temp...
2022-07-27
Code
11
VIMPAC
85.5
No
VIMPAC: Video Pre-Training via Masked Token Pred...
2021-06-21
Code
12
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
No
Relational Self-Attention: What's Missing in Att...
2021-11-02
Code
13
TQN
81.8
No
Temporal Query Networks for Fine-grained Video U...
2021-04-19
-
14
PMI Sampler
81.3
No
PMI Sampler: Patch Similarity Guided Frame Selec...
2023-04-14
Code
15
TimeSformer-L
81
No
Is Space-Time Attention All You Need for Video U...
2021-02-09
Code
16
TimeSformer-HR
78
No
Is Space-Time Attention All You Need for Video U...
2021-02-09
Code
17
SlowFast
77.6
No
SlowFast Networks for Video Recognition
2018-12-10
Code
18
TimeSformer
75
No
Is Space-Time Attention All You Need for Video U...
2021-02-09
Code
#1
LVMAE
SOTA
94.9
Accuracy
· Extra Data
· 2024-11-20
Extending Video Masked Autoencoders to 128 frames
#2
Video-FocalNet-B
SOTA
90.8
Accuracy
· 2023-07-13
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Code
#3
AIM (CLIP ViT-L/14, 32x224)
SOTA
90.6
Accuracy
· Extra Data
· 2023-02-06
AIM: Adapting Image Models for Efficient Video Action Recognition
Code
#4
DUALPATH
88.7
Accuracy
· 2023-03-17
Dual-path Adaptation from Image to Video Transformers
Code
#5
TFCNet
SOTA
88.3
Accuracy
· 2022-03-11
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
#6
StructVit-B-4-1
88.3
Accuracy
· 2024-04-05
Learning Correlation Structures for Vision Transformers
#7
ORViT TimeSformer
SOTA
88
Accuracy
· 2021-10-13
Object-Region Video Transformers
Code
#8
GC-TDN
87.6
Accuracy
· 2022-03-18
Group Contextualization for Video Recognition
Code
#9
BEVT
86.7
Accuracy
· 2021-12-02
BEVT: BERT Pretraining of Video Transformers
Code
#10
PSB
86
Accuracy
· 2022-07-27
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition
Code
#11
VIMPAC
SOTA
85.5
Accuracy
· 2021-06-21
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning
Code
#12
RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Accuracy
· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding
Code
#13
TQN
SOTA
81.8
Accuracy
· 2021-04-19
Temporal Query Networks for Fine-grained Video Understanding
#14
PMI Sampler
81.3
Accuracy
· 2023-04-14
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition
Code
#15
TimeSformer-L
SOTA
81
Accuracy
· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?
Code
#16
TimeSformer-HR
78
Accuracy
· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?
Code
#17
SlowFast
SOTA
77.6
Accuracy
· 2018-12-10
SlowFast Networks for Video Recognition
Code
#18
TimeSformer
75
Accuracy
· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?
Code