Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen
Temporal action proposal generation is an challenging and promising task which aims to locate temporal regions in real-world videos where action or event may occur. Current bottom-up proposal generation methods can generate proposals with precise boundary, but cannot efficiently generate adequately reliable confidence scores for retrieving proposals. To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map. Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously. The two-branches of BMN are jointly trained in an unified framework. We conduct experiments on two challenging datasets: THUMOS-14 and ActivityNet-1.3, where BMN shows significant performance improvement with remarkable efficiency and generalizability. Further, combining with existing action classifier, BMN can achieve state-of-the-art temporal action detection performance.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | ActivityNet-1.3 | mAP | 33.85 | BMN |
| Video | ActivityNet-1.3 | mAP IOU@0.5 | 50.07 | BMN |
| Video | ActivityNet-1.3 | mAP IOU@0.75 | 34.78 | BMN |
| Video | ActivityNet-1.3 | mAP IOU@0.95 | 8.29 | BMN |
| Video | FineAction | mAP | 9.25 | BMN (i3d feaure) |
| Video | FineAction | mAP IOU@0.5 | 14.44 | BMN (i3d feaure) |
| Video | FineAction | mAP IOU@0.75 | 8.92 | BMN (i3d feaure) |
| Video | FineAction | mAP IOU@0.95 | 3.12 | BMN (i3d feaure) |
| Video | THUMOS’14 | mAP IOU@0.5 | 32.2 | BMN |
| Video | EPIC-KITCHENS-100 | Avg mAP (0.1-0.5) | 8.4 | BMN (verb) |
| Video | EPIC-KITCHENS-100 | mAP IOU@0.1 | 10.8 | BMN (verb) |
| Video | EPIC-KITCHENS-100 | mAP IOU@0.2 | 9.8 | BMN (verb) |
| Video | EPIC-KITCHENS-100 | mAP IOU@0.3 | 8.4 | BMN (verb) |
| Video | EPIC-KITCHENS-100 | mAP IOU@0.4 | 7.1 | BMN (verb) |
| Video | EPIC-KITCHENS-100 | mAP IOU@0.5 | 5.6 | BMN (verb) |
| Video | ActivityNet-1.3 | AR@100 | 75.01 | BMN |
| Video | ActivityNet-1.3 | AUC (val) | 67.1 | BMN |
| Temporal Action Localization | ActivityNet-1.3 | mAP | 33.85 | BMN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.5 | 50.07 | BMN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.75 | 34.78 | BMN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.95 | 8.29 | BMN |
| Temporal Action Localization | FineAction | mAP | 9.25 | BMN (i3d feaure) |
| Temporal Action Localization | FineAction | mAP IOU@0.5 | 14.44 | BMN (i3d feaure) |
| Temporal Action Localization | FineAction | mAP IOU@0.75 | 8.92 | BMN (i3d feaure) |
| Temporal Action Localization | FineAction | mAP IOU@0.95 | 3.12 | BMN (i3d feaure) |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.5 | 32.2 | BMN |
| Temporal Action Localization | EPIC-KITCHENS-100 | Avg mAP (0.1-0.5) | 8.4 | BMN (verb) |
| Temporal Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.1 | 10.8 | BMN (verb) |
| Temporal Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.2 | 9.8 | BMN (verb) |
| Temporal Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.3 | 8.4 | BMN (verb) |
| Temporal Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.4 | 7.1 | BMN (verb) |
| Temporal Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.5 | 5.6 | BMN (verb) |
| Temporal Action Localization | ActivityNet-1.3 | AR@100 | 75.01 | BMN |
| Temporal Action Localization | ActivityNet-1.3 | AUC (val) | 67.1 | BMN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP | 33.85 | BMN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.5 | 50.07 | BMN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.75 | 34.78 | BMN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.95 | 8.29 | BMN |
| Zero-Shot Learning | FineAction | mAP | 9.25 | BMN (i3d feaure) |
| Zero-Shot Learning | FineAction | mAP IOU@0.5 | 14.44 | BMN (i3d feaure) |
| Zero-Shot Learning | FineAction | mAP IOU@0.75 | 8.92 | BMN (i3d feaure) |
| Zero-Shot Learning | FineAction | mAP IOU@0.95 | 3.12 | BMN (i3d feaure) |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.5 | 32.2 | BMN |
| Zero-Shot Learning | EPIC-KITCHENS-100 | Avg mAP (0.1-0.5) | 8.4 | BMN (verb) |
| Zero-Shot Learning | EPIC-KITCHENS-100 | mAP IOU@0.1 | 10.8 | BMN (verb) |
| Zero-Shot Learning | EPIC-KITCHENS-100 | mAP IOU@0.2 | 9.8 | BMN (verb) |
| Zero-Shot Learning | EPIC-KITCHENS-100 | mAP IOU@0.3 | 8.4 | BMN (verb) |
| Zero-Shot Learning | EPIC-KITCHENS-100 | mAP IOU@0.4 | 7.1 | BMN (verb) |
| Zero-Shot Learning | EPIC-KITCHENS-100 | mAP IOU@0.5 | 5.6 | BMN (verb) |
| Zero-Shot Learning | ActivityNet-1.3 | AR@100 | 75.01 | BMN |
| Zero-Shot Learning | ActivityNet-1.3 | AUC (val) | 67.1 | BMN |
| Activity Recognition | THUMOS’14 | mAP@0.3 | 56 | BMN |
| Activity Recognition | THUMOS’14 | mAP@0.4 | 47.4 | BMN |
| Activity Recognition | THUMOS’14 | mAP@0.5 | 38.8 | BMN |
| Action Localization | ActivityNet-1.3 | mAP | 33.85 | BMN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.5 | 50.07 | BMN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.75 | 34.78 | BMN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.95 | 8.29 | BMN |
| Action Localization | FineAction | mAP | 9.25 | BMN (i3d feaure) |
| Action Localization | FineAction | mAP IOU@0.5 | 14.44 | BMN (i3d feaure) |
| Action Localization | FineAction | mAP IOU@0.75 | 8.92 | BMN (i3d feaure) |
| Action Localization | FineAction | mAP IOU@0.95 | 3.12 | BMN (i3d feaure) |
| Action Localization | THUMOS’14 | mAP IOU@0.5 | 32.2 | BMN |
| Action Localization | EPIC-KITCHENS-100 | Avg mAP (0.1-0.5) | 8.4 | BMN (verb) |
| Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.1 | 10.8 | BMN (verb) |
| Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.2 | 9.8 | BMN (verb) |
| Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.3 | 8.4 | BMN (verb) |
| Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.4 | 7.1 | BMN (verb) |
| Action Localization | EPIC-KITCHENS-100 | mAP IOU@0.5 | 5.6 | BMN (verb) |
| Action Localization | ActivityNet-1.3 | AR@100 | 75.01 | BMN |
| Action Localization | ActivityNet-1.3 | AUC (val) | 67.1 | BMN |
| Action Recognition | THUMOS’14 | mAP@0.3 | 56 | BMN |
| Action Recognition | THUMOS’14 | mAP@0.4 | 47.4 | BMN |
| Action Recognition | THUMOS’14 | mAP@0.5 | 38.8 | BMN |