Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Computer Vision
/
Zero-Shot Action Recognition
/
Kinetics
Zero-Shot Action Recognition on Kinetics
Metric: Top-5 Accuracy (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Top-5 Accuracy (best first)
Top-5 Accuracy (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Top-5 Accuracy
▼
Extra Data
Paper
Date
↕
Code
1
TC-CLIP
95.7
No
Leveraging Temporal Contextualization for Video ...
2024-04-15
Code
2
OST
94.6
No
OST: Refining Text Knowledge with Optimal Spatio...
2023-11-30
Code
3
BIKE
91.1
No
Bidirectional Cross-Modal Knowledge Exploration ...
2022-12-31
Code
4
Text4Vis
90.3
No
Revisiting Classifier: Transferring Vision-Langu...
2022-07-04
Code
5
VideoCoCa
88.9
Yes
VideoCoCa: Video-Text Modeling with Zero-Shot Tr...
2022-12-09
-
6
X-CLIP
86.1
No
Expanding Language-Image Pretrained Models for G...
2022-08-04
Code
7
LanguageBind
85.7
Yes
LanguageBind: Extending Video-Language Pretraini...
2023-10-03
Code
8
JigsawNet
78.8
No
-
-
Code
9
ER-ZSAR (ST+Obj)
73.1
No
Elaborative Rehearsal for Zero-shot Action Recog...
2021-08-05
Code
10
ER-ZSAR (ST)
69.3
No
Elaborative Rehearsal for Zero-shot Action Recog...
2021-08-05
Code
11
DEVISE
51
No
-
-
-
12
ALE
50.3
No
Label-Embedding for Image Classification
2015-03-30
Code
13
GCN
49.7
No
All About Knowledge Graphs for Actions
2020-08-28
-
14
DEM
49.5
No
Learning a Deep Embedding Model for Zero-Shot Le...
2016-11-15
Code
15
ESZSL
48.3
No
-
-
Code
16
SJE(Word Embedding)
48.2
No
Evaluation of Output Embeddings for Fine-Grained...
2014-09-30
Code
#1
TC-CLIP
SOTA
95.7
Top-5 Accuracy
· 2024-04-15
Leveraging Temporal Contextualization for Video Action Recognition
Code
#2
OST
SOTA
94.6
Top-5 Accuracy
· 2023-11-30
OST: Refining Text Knowledge with Optimal Spatio-Temporal Descriptor for General Video Recognition
Code
#3
BIKE
SOTA
91.1
Top-5 Accuracy
· 2022-12-31
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Code
#4
Text4Vis
SOTA
90.3
Top-5 Accuracy
· 2022-07-04
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Code
#5
VideoCoCa
88.9
Top-5 Accuracy
· Extra Data
· 2022-12-09
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
#6
X-CLIP
86.1
Top-5 Accuracy
· 2022-08-04
Expanding Language-Image Pretrained Models for General Video Recognition
Code
#7
LanguageBind
85.7
Top-5 Accuracy
· Extra Data
· 2023-10-03
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
Code
#8
JigsawNet
78.8
Top-5 Accuracy
No paper
Code
#9
ER-ZSAR (ST+Obj)
SOTA
73.1
Top-5 Accuracy
· 2021-08-05
Elaborative Rehearsal for Zero-shot Action Recognition
Code
#10
ER-ZSAR (ST)
69.3
Top-5 Accuracy
· 2021-08-05
Elaborative Rehearsal for Zero-shot Action Recognition
Code
#11
DEVISE
51
Top-5 Accuracy
No paper
#12
ALE
SOTA
50.3
Top-5 Accuracy
· 2015-03-30
Label-Embedding for Image Classification
Code
#13
GCN
49.7
Top-5 Accuracy
· 2020-08-28
All About Knowledge Graphs for Actions
#14
DEM
49.5
Top-5 Accuracy
· 2016-11-15
Learning a Deep Embedding Model for Zero-Shot Learning
Code
#15
ESZSL
48.3
Top-5 Accuracy
No paper
Code
#16
SJE(Word Embedding)
SOTA
48.2
Top-5 Accuracy
· 2014-09-30
Evaluation of Output Embeddings for Fine-Grained Image Classification
Code