Metric: Frame-mAP 0.5 (higher is better)
| # | Model↕ | Frame-mAP 0.5▼ | Extra Data | Paper | Date↕ | Code |
|---|---|---|---|---|---|---|
| 1 | SiA | 88.5 | No | Scaling Open-Vocabulary Action Detection | 2025-04-04 | Code |
| 2 | HIT | 83.8 | No | Holistic Interaction Transformer Network for Act... | 2022-10-23 | Code |
| 3 | HISAN (VGG-16) | 76.72 | No | - | - | - |
| 4 | YOWO + LFB | 75.7 | No | You Only Watch Once: A Unified CNN Architecture ... | 2019-11-15 | Code |
| 5 | YOWO | 74.4 | No | You Only Watch Once: A Unified CNN Architecture ... | 2019-11-15 | Code |
| 6 | MOC | 74 | No | Actions as Moving Points | 2020-01-14 | Code |
| 7 | Faster-RCNN + two-stream I3D conv | 73.3 | No | AVA: A Video Dataset of Spatio-temporally Locali... | 2017-05-23 | Code |
| 8 | TACNet | 65.5 | No | TACNet: Transition-Aware Context Network for Spa... | 2019-05-31 | - |
| 9 | T-CNN | 61.3 | No | Tube Convolutional Neural Network (T-CNN) for Ac... | 2017-03-30 | Code |
| 10 | MR-TS R-CNN | 58.5 | No | - | - | - |
| 11 | TS R-CNN | 56.9 | No | - | - | - |
| 12 | Actionness | 39.9 | No | Actionness Estimation Using Hybrid Fully Convolu... | 2016-04-25 | - |
| 13 | Action Tubes | 36.2 | No | Finding Action Tubes | 2014-11-21 | Code |