Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Zero-Shot Action Recognition
/
UCF101
Zero-Shot Action Recognition on UCF101
Metric: Top-1 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top-1 Accuracy (best first)
Top-1 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-1 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
OTI(ViT-L/14)
92.8
No
Orthogonal Temporal Interpolation for Zero-Shot ...
2023-08-14
Code
2
IMP-MoE-L
91.5
Yes
Alternating Gradient Descent and Mixture-of-Expe...
2023-05-10
-
3
MOV (ViT-L/14)
87.1
No
Multimodal Open-Vocabulary Video Classification ...
2022-07-15
-
4
VideoCoCa
86.6
Yes
VideoCoCa: Video-Text Modeling with Zero-Shot Tr...
2022-12-09
-
5
BIKE
86.6
No
Bidirectional Cross-Modal Knowledge Exploration ...
2022-12-31
Code
6
Text4Vis
85.8
No
Revisiting Classifier: Transferring Vision-Langu...
2022-07-04
Code
7
TC-CLIP
85.4
No
Leveraging Temporal Contextualization for Video ...
2024-04-15
Code
8
EVA-CLIP-E/14+
83.1
Yes
EVA-CLIP: Improved Training Techniques for CLIP ...
2023-03-27
Code
9
MOV (ViT-B/16)
82.6
No
Multimodal Open-Vocabulary Video Classification ...
2022-07-15
-
10
OST
79.7
No
OST: Refining Text Knowledge with Optimal Spatio...
2023-11-30
Code
11
EZ-CLIP
79.1
Yes
EZ-CLIP: Efficient Zeroshot Video Action Recogni...
2023-12-13
Code
12
MAXI
78.2
No
MAtch, eXpand and Improve: Unsupervised Finetuni...
2023-03-15
Code
13
LoCATe-GAT
76
No
-
-
Code
14
VicTR (ViT-B/16)
72.4
No
VicTR: Video-conditioned Text Representations fo...
2023-04-05
-
15
X-CLIP
72
No
Expanding Language-Image Pretrained Models for G...
2022-08-04
Code
16
ResT
58.7
No
Cross-modal Representation Learning for Zero-sho...
2022-05-03
-
17
AURL
58
No
Alignment-Uniformity aware Representation Learni...
2022-03-29
Code
18
JigsawNet
56
No
-
-
Code
19
CLASTER
53.9
No
CLASTER: Clustering with Reinforcement Learning ...
2021-01-18
-
20
ER-ZSAR
51.8
No
Elaborative Rehearsal for Zero-shot Action Recog...
2021-08-05
Code
21
E2E
48
No
Rethinking Zero-shot Video Classification: End-t...
2020-03-03
Code
22
SPOT
40.9
No
Synthetic Sample Selection for Generalized Zero-...
2023-04-06
-
23
TS-GCN
34.2
No
-
-
Code
24
O2A
30.3
No
Objects2action: Classifying and localizing actio...
2015-10-23
-
25
ASR
24.4
No
Alternative Semantic Representations for Zero-Sh...
2017-06-28
-
26
UR
17.5
No
Towards Universal Representation for Unseen Acti...
2018-03-22
-
27
IAP
16.7
No
-
-
-
28
DAP
15.9
No
-
-
-
29
MTE
15.8
No
Multi-Task Zero-Shot Action Recognition with Pri...
2016-11-26
-
30
ZSECOC
15.1
No
-
-
-
31
ESZSL
15
No
-
-
Code
32
HAA
14.9
No
-
-
-
33
SJE(Attribute)
12
No
Evaluation of Output Embeddings for Fine-Grained...
2014-09-30
Code
34
SVE
10.9
No
Semantic Embedding Space for Zero-Shot Action Re...
2015-02-05
-
35
SJE(Word Embedding)
9.9
No
Evaluation of Output Embeddings for Fine-Grained...
2014-09-30
Code
#1
OTI(ViT-L/14)
SOTA
92.8
Top-1 Accuracy
· 2023-08-14
Orthogonal Temporal Interpolation for Zero-Shot Video Recognition
Code
#2
IMP-MoE-L
SOTA
91.5
Top-1 Accuracy
· Extra Data
· 2023-05-10
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
#3
MOV (ViT-L/14)
SOTA
87.1
Top-1 Accuracy
· 2022-07-15
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
#4
VideoCoCa
86.6
Top-1 Accuracy
· Extra Data
· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
#5
BIKE
86.6
Top-1 Accuracy
· 2022-12-31
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Code
#6
Text4Vis
SOTA
85.8
Top-1 Accuracy
· 2022-07-04
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Code
#7
TC-CLIP
85.4
Top-1 Accuracy
· 2024-04-15
Leveraging Temporal Contextualization for Video Action Recognition
Code
#8
EVA-CLIP-E/14+
83.1
Top-1 Accuracy
· Extra Data
· 2023-03-27
EVA-CLIP: Improved Training Techniques for CLIP at Scale
Code
#9
MOV (ViT-B/16)
82.6
Top-1 Accuracy
· 2022-07-15
Multimodal Open-Vocabulary Video Classification via Pre-Trained Vision and Language Models
#10
OST
79.7
Top-1 Accuracy
· 2023-11-30
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Code
#11
EZ-CLIP
79.1
Top-1 Accuracy
· Extra Data
· 2023-12-13
EZ-CLIP: Efficient Zeroshot Video Action Recognition
Code
#12
MAXI
78.2
Top-1 Accuracy
· 2023-03-15
MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Code
#13
LoCATe-GAT
76
Top-1 Accuracy
No paper
Code
#14
VicTR (ViT-B/16)
72.4
Top-1 Accuracy
· 2023-04-05
VicTR: Video-conditioned Text Representations for Activity Recognition
#15
X-CLIP
72
Top-1 Accuracy
· 2022-08-04
Expanding Language-Image Pretrained Models for General Video Recognition
Code
#16
ResT
SOTA
58.7
Top-1 Accuracy
· 2022-05-03
Cross-modal Representation Learning for Zero-shot Action Recognition
#17
AURL
SOTA
58
Top-1 Accuracy
· 2022-03-29
Alignment-Uniformity aware Representation Learning for Zero-shot Video Classification
Code
#18
JigsawNet
56
Top-1 Accuracy
No paper
Code
#19
CLASTER
SOTA
53.9
Top-1 Accuracy
· 2021-01-18
CLASTER: Clustering with Reinforcement Learning for Zero-Shot Action Recognition
#20
ER-ZSAR
51.8
Top-1 Accuracy
· 2021-08-05
Elaborative Rehearsal for Zero-shot Action Recognition
Code
#21
E2E
SOTA
48
Top-1 Accuracy
· 2020-03-03
Rethinking Zero-shot Video Classification: End-to-end Training for Realistic Applications
Code
#22
SPOT
40.9
Top-1 Accuracy
· 2023-04-06
Synthetic Sample Selection for Generalized Zero-Shot Learning
#23
TS-GCN
34.2
Top-1 Accuracy
No paper
Code
#24
O2A
SOTA
30.3
Top-1 Accuracy
· 2015-10-23
Objects2action: Classifying and localizing actions without any video example
#25
ASR
24.4
Top-1 Accuracy
· 2017-06-28
Alternative Semantic Representations for Zero-Shot Human Action Recognition
#26
UR
17.5
Top-1 Accuracy
· 2018-03-22
Towards Universal Representation for Unseen Action Recognition
#27
IAP
16.7
Top-1 Accuracy
No paper
#28
DAP
15.9
Top-1 Accuracy
No paper
#29
MTE
15.8
Top-1 Accuracy
· 2016-11-26
Multi-Task Zero-Shot Action Recognition with Prioritised Data Augmentation
#30
ZSECOC
15.1
Top-1 Accuracy
No paper
#31
ESZSL
15
Top-1 Accuracy
No paper
Code
#32
HAA
14.9
Top-1 Accuracy
No paper
#33
SJE(Attribute)
SOTA
12
Top-1 Accuracy
· 2014-09-30
Evaluation of Output Embeddings for Fine-Grained Image Classification
Code
#34
SVE
10.9
Top-1 Accuracy
· 2015-02-05
Semantic Embedding Space for Zero-Shot Action Recognition
#35
SJE(Word Embedding)
9.9
Top-1 Accuracy
· 2014-09-30
Evaluation of Output Embeddings for Fine-Grained Image Classification
Code