Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
EPIC-KITCHENS-100
Action Recognition on EPIC-KITCHENS-100
Metric: Verb@1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Verb@1 (best first)
Verb@1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Verb@1
▼
Extra Data
Paper
Date
↕
Code
1
TIM
76.2
Yes
TIM: A Time Interval Machine for Audio-Visual Ac...
2024-04-08
Code
2
LLaVAction
76
Yes
LLaVAction: evaluating and training multi-modal ...
2025-03-24
Code
3
LVMAE
75
Yes
Extending Video Masked Autoencoders to 128 frames
2024-11-20
-
4
Avion (ViT-L)
73
Yes
Training a Large Video Model on a Single Machine...
2023-09-28
Code
5
CAST(ViT-B/16)
72.5
No
CAST: Cross-Attention in Space and Time for Vide...
2023-11-30
Code
6
MoViNet-A6
72.2
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
7
M&M (WTS 60M)
72
Yes
M&M Mix: A Multimodal Multiview Transformer Ense...
2022-06-20
-
8
LaViLa (TimeSformer-L)
72
Yes
Learning Video Representations from Large Langua...
2022-12-08
Code
9
TAdaFormer-L/14
71.7
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
10
MeMViT-24
71.4
Yes
MeMViT: Memory-Augmented Multiscale Vision Trans...
2022-01-20
Code
11
TAdaConvNeXtV2-S
71
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
12
AVT
70.4
No
-
-
-
13
MMT
70.1
No
-
-
-
14
MTV-B (WTS 60M)
69.9
Yes
Multiview Transformers for Video Recognition
2022-01-12
Code
15
OMNIVORE (Swin-B, finetuned)
69.5
Yes
Omnivore: A Single Model for Many Visual Modalit...
2022-01-20
Code
16
MoViNet-A5
69.1
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
17
GSF
69.06
Yes
Gate-Shift-Fuse for Video Action Recognition
2022-03-16
Code
18
MoViNet-A4
68.8
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
19
ORViT Mformer-L (ORViT blocks)
68.4
No
Object-Region Video Transformers
2021-10-13
Code
20
Mformer-L
67.1
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
21
MoViNet-A2
67.1
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
22
Mformer-HR
67
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
23
Mformer
66.7
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
24
ViViT-L/16x2 Fact. encoder
66.4
No
ViViT: A Video Vision Transformer
2021-03-29
Code
25
TempAgg
66
No
Technical Report: Temporal Aggregate Representat...
2021-06-06
Code
26
MBT
64.8
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
27
MoViNet-A0
64.8
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
#1
TIM
SOTA
76.2
Verb@1
· Extra Data
· 2024-04-08
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Code
#2
LLaVAction
76
Verb@1
· Extra Data
· 2025-03-24
LLaVAction: evaluating and training multi-modal large language models for action recognition
Code
#3
LVMAE
75
Verb@1
· Extra Data
· 2024-11-20
Extending Video Masked Autoencoders to 128 frames
#4
Avion (ViT-L)
SOTA
73
Verb@1
· Extra Data
· 2023-09-28
Training a Large Video Model on a Single Machine in a Day
Code
#5
CAST(ViT-B/16)
72.5
Verb@1
· 2023-11-30
CAST: Cross-Attention in Space and Time for Video Action Recognition
Code
#6
MoViNet-A6
SOTA
72.2
Verb@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#7
M&M (WTS 60M)
72
Verb@1
· Extra Data
· 2022-06-20
M&M Mix: A Multimodal Multiview Transformer Ensemble
#8
LaViLa (TimeSformer-L)
72
Verb@1
· Extra Data
· 2022-12-08
Learning Video Representations from Large Language Models
Code
#9
TAdaFormer-L/14
71.7
Verb@1
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#10
MeMViT-24
71.4
Verb@1
· Extra Data
· 2022-01-20
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Code
#11
TAdaConvNeXtV2-S
71
Verb@1
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#12
AVT
70.4
Verb@1
No paper
#13
MMT
70.1
Verb@1
No paper
#14
MTV-B (WTS 60M)
69.9
Verb@1
· Extra Data
· 2022-01-12
Multiview Transformers for Video Recognition
Code
#15
OMNIVORE (Swin-B, finetuned)
69.5
Verb@1
· Extra Data
· 2022-01-20
Omnivore: A Single Model for Many Visual Modalities
Code
#16
MoViNet-A5
69.1
Verb@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#17
GSF
69.06
Verb@1
· Extra Data
· 2022-03-16
Gate-Shift-Fuse for Video Action Recognition
Code
#18
MoViNet-A4
68.8
Verb@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#19
ORViT Mformer-L (ORViT blocks)
68.4
Verb@1
· 2021-10-13
Object-Region Video Transformers
Code
#20
Mformer-L
67.1
Verb@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#21
MoViNet-A2
67.1
Verb@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#22
Mformer-HR
67
Verb@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#23
Mformer
66.7
Verb@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#24
ViViT-L/16x2 Fact. encoder
66.4
Verb@1
· 2021-03-29
ViViT: A Video Vision Transformer
Code
#25
TempAgg
66
Verb@1
· 2021-06-06
Technical Report: Temporal Aggregate Representations
Code
#26
MBT
64.8
Verb@1
· 2021-06-30
Attention Bottlenecks for Multimodal Fusion
Code
#27
MoViNet-A0
64.8
Verb@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code