Metric: mAP (Val) (higher is better)
| # | Model↕ | mAP (Val)▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | STAR/L | 41.7 | Yes | End-to-End Spatio-Temporal Action Localisation w... | 2023-04-24 | - |
| 2 | CQVAD | 38.4 | No | Classification Matters: Improving Video Action D... | 2024-07-29 | - |
| 3 | ACAR-Net, SlowFast R-101 (Kinetics-400 pretraining) | 30 | No | Actor-Context-Actor Relation Network for Spatio-... | 2020-06-14 | Code |
| 4 | JMRN + SlowFast-R101-NL | 28.4 | No | Pose And Joint-Aware Action Recognition | 2020-10-16 | Code |
| 5 | SlowFast++ (Kinetics-600 pretraining, NL) | 28.3 | No | SlowFast Networks for Video Recognition | 2018-12-10 | Code |
| 6 | LFB (Kinetics-400 pretraining) | 27.7 | No | Long-Term Feature Banks for Detailed Video Under... | 2018-12-12 | Code |
| 7 | I3D Tx HighRes | 27.6 | No | Video Action Transformer Network | 2018-12-06 | - |
| 8 | SlowFast (Kinetics-600 pretraining, NL) | 27.3 | No | SlowFast Networks for Video Recognition | 2018-12-10 | Code |
| 9 | SlowFast (Kinetics-600 pretraining) | 26.8 | No | SlowFast Networks for Video Recognition | 2018-12-10 | Code |
| 10 | SlowFast (Kinetics-400 pretraining) | 26.3 | No | SlowFast Networks for Video Recognition | 2018-12-10 | Code |
| 11 | I3D I3D | 23.4 | No | Video Action Transformer Network | 2018-12-06 | - |
| 12 | D3D (ResNet RPN, Kinetics-400 pretraining) | 23 | No | D3D: Distilled 3D Networks for Video Action Reco... | 2018-12-19 | Code |
| 13 | I3D w/ RPN + JFT (Kinetics-400 pretraining( | 22.8 | No | A Better Baseline for AVA | 2018-07-26 | - |
| 14 | S3D-G w/ ResNet RPN (Kinetics-400 pretraining( | 22 | No | AVA: A Video Dataset of Spatio-temporally Locali... | 2017-05-23 | Code |
| 15 | I3D w/ RPN (Kinetics-400 pretraining( | 21.9 | No | A Better Baseline for AVA | 2018-07-26 | - |
| 16 | YOWO+LFB* | 19.2 | No | You Only Watch Once: A Unified CNN Architecture ... | 2019-11-15 | Code |
| 17 | ARCN | 17.4 | No | Actor-Centric Relation Network | 2018-07-28 | Code |