Fangzhou Mu, Sicheng Mo, Gillian Wang, Yin Li
This report describes our submission to the Ego4D Moment Queries Challenge 2022. Our submission builds on ActionFormer, the state-of-the-art backbone for temporal action localization, and a trio of strong video features from SlowFast, Omnivore and EgoVLP. Our solution is ranked 2nd on the public leaderboard with 21.76% average mAP on the test set, which is nearly three times higher than the official baseline. Further, we obtain 42.54% Recall@1x at tIoU=0.5 on the test set, outperforming the top-ranked solution by a significant margin of 1.41 absolute percentage points. Our code is available at https://github.com/happyharrycn/actionformer_release.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | Ego4D MQ test | Average mAP | 21.76 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Video | Ego4D MQ test | Recall@1x (tIoU=0.5) | 42.54 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Video | Ego4D MQ val | Average mAP | 21.4 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Video | Ego4D MQ val | Recall@1x (tIoU=0.5) | 38.73 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Temporal Action Localization | Ego4D MQ test | Average mAP | 21.76 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Temporal Action Localization | Ego4D MQ test | Recall@1x (tIoU=0.5) | 42.54 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Temporal Action Localization | Ego4D MQ val | Average mAP | 21.4 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Temporal Action Localization | Ego4D MQ val | Recall@1x (tIoU=0.5) | 38.73 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Zero-Shot Learning | Ego4D MQ test | Average mAP | 21.76 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Zero-Shot Learning | Ego4D MQ test | Recall@1x (tIoU=0.5) | 42.54 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Zero-Shot Learning | Ego4D MQ val | Average mAP | 21.4 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Zero-Shot Learning | Ego4D MQ val | Recall@1x (tIoU=0.5) | 38.73 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Action Localization | Ego4D MQ test | Average mAP | 21.76 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Action Localization | Ego4D MQ test | Recall@1x (tIoU=0.5) | 42.54 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Action Localization | Ego4D MQ val | Average mAP | 21.4 | ActionFormer (SlowFast+Omnivore+EgoVLP) |
| Action Localization | Ego4D MQ val | Recall@1x (tIoU=0.5) | 38.73 | ActionFormer (SlowFast+Omnivore+EgoVLP) |