Tasks
SotA
Datasets
Papers
Methods
Submit
About
SotA
/
Time Series
/
Action Recognition
/
EPIC-KITCHENS-100
Action Recognition on EPIC-KITCHENS-100
Metric: Action@1 (higher is better)
Leaderboard
Dataset
Loading chart...
Results
Submit a result
Hide extra data
Export CSV
Sort:
Action@1 (best first)
Action@1 (worst first)
Date (newest first)
Date (oldest first)
Model name (A→Z)
#
Model
↕
Action@1
▼
Extra Data
Paper
Date
↕
Code
1
LLaVAction
58.3
Yes
LLaVAction: evaluating and training multi-modal ...
2025-03-24
Code
2
TIM
56.4
Yes
TIM: A Time Interval Machine for Audio-Visual Ac...
2024-04-08
Code
3
Avion (ViT-L)
54.4
Yes
Training a Large Video Model on a Single Machine...
2023-09-28
Code
4
M&M (WTS 60M)
53.6
Yes
M&M Mix: A Multimodal Multiview Transformer Ense...
2022-06-20
-
5
LVMAE
52.1
Yes
Extending Video Masked Autoencoders to 128 frames
2024-11-20
-
6
TAdaFormer-L/14
51.8
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
7
LaViLa (TimeSformer-L)
51
Yes
Learning Video Representations from Large Langua...
2022-12-08
Code
8
MTV-B (WTS 60M)
50.5
Yes
Multiview Transformers for Video Recognition
2022-01-12
Code
9
OMNIVORE (Swin-B, finetuned)
49.9
Yes
Omnivore: A Single Model for Many Visual Modalit...
2022-01-20
Code
10
CAST(ViT-B/16)
49.3
No
CAST: Cross-Attention in Space and Time for Vide...
2023-11-30
Code
11
TAdaConvNeXtV2-S
48.9
Yes
Temporally-Adaptive Models for Efficient Video U...
2023-08-10
Code
12
MeMViT-24
48.4
Yes
MeMViT: Memory-Augmented Multiscale Vision Trans...
2022-01-20
Code
13
MMT
47.8
No
-
-
-
14
MoViNet-A6
47.7
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
15
AVT
47.2
No
-
-
-
16
ORViT Mformer-L (ORViT blocks)
45.7
No
Object-Region Video Transformers
2021-10-13
Code
17
TempAgg
45.26
No
Technical Report: Temporal Aggregate Representat...
2021-06-06
Code
18
MoViNet-A5
44.5
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
19
Mformer-HR
44.5
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
20
GSF
44.48
Yes
Gate-Shift-Fuse for Video Action Recognition
2022-03-16
Code
21
MoViNet-A4
44.4
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
22
Mformer-L
44.1
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
23
ViViT-L/16x2 Fact. encoder
44
No
ViViT: A Video Vision Transformer
2021-03-29
Code
24
MBT
43.4
No
Attention Bottlenecks for Multimodal Fusion
2021-06-30
Code
25
Mformer
43.1
Yes
Keeping Your Eye on the Ball: Trajectory Attenti...
2021-06-09
Code
26
MoViNet-A2
41.2
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
27
TSM
37.39
No
Rescaling Egocentric Vision
2020-06-23
Code
28
SlowFast
36.81
No
Rescaling Egocentric Vision
2020-06-23
Code
29
MoViNet-A0
36.8
No
MoViNets: Mobile Video Networks for Efficient Vi...
2021-03-21
Code
30
TBN
35.55
No
Rescaling Egocentric Vision
2020-06-23
Code
31
TRN
35.28
No
Rescaling Egocentric Vision
2020-06-23
Code
32
TSN
33.57
No
Rescaling Egocentric Vision
2020-06-23
Code
#1
LLaVAction
SOTA
58.3
Action@1
· Extra Data
· 2025-03-24
LLaVAction: evaluating and training multi-modal large language models for action recognition
Code
#2
TIM
SOTA
56.4
Action@1
· Extra Data
· 2024-04-08
TIM: A Time Interval Machine for Audio-Visual Action Recognition
Code
#3
Avion (ViT-L)
SOTA
54.4
Action@1
· Extra Data
· 2023-09-28
Training a Large Video Model on a Single Machine in a Day
Code
#4
M&M (WTS 60M)
SOTA
53.6
Action@1
· Extra Data
· 2022-06-20
M&M Mix: A Multimodal Multiview Transformer Ensemble
#5
LVMAE
52.1
Action@1
· Extra Data
· 2024-11-20
Extending Video Masked Autoencoders to 128 frames
#6
TAdaFormer-L/14
51.8
Action@1
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#7
LaViLa (TimeSformer-L)
51
Action@1
· Extra Data
· 2022-12-08
Learning Video Representations from Large Language Models
Code
#8
MTV-B (WTS 60M)
SOTA
50.5
Action@1
· Extra Data
· 2022-01-12
Multiview Transformers for Video Recognition
Code
#9
OMNIVORE (Swin-B, finetuned)
49.9
Action@1
· Extra Data
· 2022-01-20
Omnivore: A Single Model for Many Visual Modalities
Code
#10
CAST(ViT-B/16)
49.3
Action@1
· 2023-11-30
CAST: Cross-Attention in Space and Time for Video Action Recognition
Code
#11
TAdaConvNeXtV2-S
48.9
Action@1
· Extra Data
· 2023-08-10
Temporally-Adaptive Models for Efficient Video Understanding
Code
#12
MeMViT-24
48.4
Action@1
· Extra Data
· 2022-01-20
MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition
Code
#13
MMT
47.8
Action@1
No paper
#14
MoViNet-A6
SOTA
47.7
Action@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#15
AVT
47.2
Action@1
No paper
#16
ORViT Mformer-L (ORViT blocks)
45.7
Action@1
· 2021-10-13
Object-Region Video Transformers
Code
#17
TempAgg
45.26
Action@1
· 2021-06-06
Technical Report: Temporal Aggregate Representations
Code
#18
MoViNet-A5
44.5
Action@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#19
Mformer-HR
44.5
Action@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#20
GSF
44.48
Action@1
· Extra Data
· 2022-03-16
Gate-Shift-Fuse for Video Action Recognition
Code
#21
MoViNet-A4
44.4
Action@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#22
Mformer-L
44.1
Action@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#23
ViViT-L/16x2 Fact. encoder
44
Action@1
· 2021-03-29
ViViT: A Video Vision Transformer
Code
#24
MBT
43.4
Action@1
· 2021-06-30
Attention Bottlenecks for Multimodal Fusion
Code
#25
Mformer
43.1
Action@1
· Extra Data
· 2021-06-09
Keeping Your Eye on the Ball: Trajectory Attention in Video Transformers
Code
#26
MoViNet-A2
41.2
Action@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#27
TSM
SOTA
37.39
Action@1
· 2020-06-23
Rescaling Egocentric Vision
Code
#28
SlowFast
36.81
Action@1
· 2020-06-23
Rescaling Egocentric Vision
Code
#29
MoViNet-A0
36.8
Action@1
· 2021-03-21
MoViNets: Mobile Video Networks for Efficient Video Recognition
Code
#30
TBN
35.55
Action@1
· 2020-06-23
Rescaling Egocentric Vision
Code
#31
TRN
35.28
Action@1
· 2020-06-23
Rescaling Egocentric Vision
Code
#32
TSN
33.57
Action@1
· 2020-06-23
Rescaling Egocentric Vision
Code