Action Recognition on Diving-48

Metric: Accuracy (higher is better)

LeaderboardDataset

Loading chart...

Results

Hide extra data

Sort:

#	Model↕	Accuracy▼	Extra Data	Paper	Date↕	Code
1	LVMAE	94.9	Yes	Extending Video Masked Autoencoders to 128 frames	2024-11-20	-
2	Video-FocalNet-B	90.8	No	Video-FocalNets: Spatio-Temporal Focal Modulatio...	2023-07-13	Code
3	AIM (CLIP ViT-L/14, 32x224)	90.6	Yes	AIM: Adapting Image Models for Efficient Video A...	2023-02-06	Code
4	DUALPATH	88.7	No	Dual-path Adaptation from Image to Video Transfo...	2023-03-17	Code
5	TFCNet	88.3	No	TFCNet: Temporal Fully Connected Networks for St...	2022-03-11	-
6	StructVit-B-4-1	88.3	No	Learning Correlation Structures for Vision Trans...	2024-04-05	-
7	ORViT TimeSformer	88	No	Object-Region Video Transformers	2021-10-13	Code
8	GC-TDN	87.6	No	Group Contextualization for Video Recognition	2022-03-18	Code
9	BEVT	86.7	No	BEVT: BERT Pretraining of Video Transformers	2021-12-02	Code
10	PSB	86	No	Spatiotemporal Self-attention Modeling with Temp...	2022-07-27	Code
11	VIMPAC	85.5	No	VIMPAC: Video Pre-Training via Masked Token Pred...	2021-06-21	Code
12	RSANet-R50 (16 frames, ImageNet pretrained, a single clip)	84.2	No	Relational Self-Attention: What's Missing in Att...	2021-11-02	Code
13	TQN	81.8	No	Temporal Query Networks for Fine-grained Video U...	2021-04-19	-
14	PMI Sampler	81.3	No	PMI Sampler: Patch Similarity Guided Frame Selec...	2023-04-14	Code
15	TimeSformer-L	81	No	Is Space-Time Attention All You Need for Video U...	2021-02-09	Code
16	TimeSformer-HR	78	No	Is Space-Time Attention All You Need for Video U...	2021-02-09	Code
17	SlowFast	77.6	No	SlowFast Networks for Video Recognition	2018-12-10	Code
18	TimeSformer	75	No	Is Space-Time Attention All You Need for Video U...	2021-02-09	Code

#1LVMAESOTA
94.9
Accuracy· Extra Data· 2024-11-20
Extending Video Masked Autoencoders to 128 frames
#2Video-FocalNet-BSOTA
90.8
Accuracy· 2023-07-13
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition Code
#3AIM (CLIP ViT-L/14, 32x224)SOTA
90.6
Accuracy· Extra Data· 2023-02-06
AIM: Adapting Image Models for Efficient Video Action Recognition Code
#4DUALPATH
88.7
Accuracy· 2023-03-17
Dual-path Adaptation from Image to Video Transformers Code
#5TFCNetSOTA
88.3
Accuracy· 2022-03-11
TFCNet: Temporal Fully Connected Networks for Static Unbiased Temporal Reasoning
#6StructVit-B-4-1
88.3
Accuracy· 2024-04-05
Learning Correlation Structures for Vision Transformers
#7ORViT TimeSformerSOTA
88
Accuracy· 2021-10-13
Object-Region Video Transformers Code
#8GC-TDN
87.6
Accuracy· 2022-03-18
Group Contextualization for Video Recognition Code
#9BEVT
86.7
Accuracy· 2021-12-02
BEVT: BERT Pretraining of Video Transformers Code
#10PSB
86
Accuracy· 2022-07-27
Spatiotemporal Self-attention Modeling with Temporal Patch Shift for Action Recognition Code
#11VIMPACSOTA
85.5
Accuracy· 2021-06-21
VIMPAC: Video Pre-Training via Masked Token Prediction and Contrastive Learning Code
#12RSANet-R50 (16 frames, ImageNet pretrained, a single clip)
84.2
Accuracy· 2021-11-02
Relational Self-Attention: What's Missing in Attention for Video Understanding Code
#13TQNSOTA
81.8
Accuracy· 2021-04-19
Temporal Query Networks for Fine-grained Video Understanding
#14PMI Sampler
81.3
Accuracy· 2023-04-14
PMI Sampler: Patch Similarity Guided Frame Selection for Aerial Action Recognition Code
#15TimeSformer-LSOTA
81
Accuracy· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?Code
#16TimeSformer-HR
78
Accuracy· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?Code
#17SlowFastSOTA
77.6
Accuracy· 2018-12-10
SlowFast Networks for Video Recognition Code
#18TimeSformer
75
Accuracy· 2021-02-09
Is Space-Time Attention All You Need for Video Understanding?Code