Tianwei Lin, Xu Zhao, Haisheng Su, Chongjing Wang, Ming Yang
Temporal action proposal generation is an important yet challenging problem, since temporal proposals with rich action content are indispensable for analysing real-world videos with long duration and high proportion irrelevant content. This problem requires methods not only generating proposals with precise temporal boundaries, but also retrieving proposals to cover truth action instances with high recall and high overlap using relatively fewer proposals. To address these difficulties, we introduce an effective proposal generation method, named Boundary-Sensitive Network (BSN), which adopts "local to global" fashion. Locally, BSN first locates temporal boundaries with high probabilities, then directly combines these boundaries as proposals. Globally, with Boundary-Sensitive Proposal feature, BSN retrieves proposals by evaluating the confidence of whether a proposal contains an action within its region. We conduct experiments on two challenging datasets: ActivityNet-1.3 and THUMOS14, where BSN outperforms other state-of-the-art temporal action proposal generation methods with high recall and high temporal precision. Finally, further experiments demonstrate that by combining existing action classifiers, our method significantly improves the state-of-the-art temporal action detection performance.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Video | ActivityNet-1.3 | mAP | 30.03 | BSN |
| Video | ActivityNet-1.3 | mAP IOU@0.5 | 46.45 | BSN |
| Video | ActivityNet-1.3 | mAP IOU@0.75 | 29.96 | BSN |
| Video | ActivityNet-1.3 | mAP IOU@0.95 | 8.02 | BSN |
| Video | THUMOS’14 | mAP IOU@0.3 | 53.5 | BSN UNet |
| Video | THUMOS’14 | mAP IOU@0.4 | 45 | BSN UNet |
| Video | THUMOS’14 | mAP IOU@0.5 | 36.9 | BSN UNet |
| Video | THUMOS’14 | mAP IOU@0.6 | 28.4 | BSN UNet |
| Video | THUMOS’14 | mAP IOU@0.7 | 20 | BSN UNet |
| Video | THUMOS' 14 | AR@100 | 46.06 | BSN + Soft-NMS |
| Video | THUMOS' 14 | AR@1000 | 64.52 | BSN + Soft-NMS |
| Video | THUMOS' 14 | AR@200 | 53.21 | BSN + Soft-NMS |
| Video | THUMOS' 14 | AR@50 | 37.46 | BSN + Soft-NMS |
| Video | THUMOS' 14 | AR@500 | 60.64 | BSN + Soft-NMS |
| Video | ActivityNet-1.3 | AR@100 | 74.16 | BSN |
| Video | ActivityNet-1.3 | AUC (test) | 66.26 | BSN |
| Video | ActivityNet-1.3 | AUC (val) | 66.17 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | mAP | 30.03 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.5 | 46.45 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.75 | 29.96 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | mAP IOU@0.95 | 8.02 | BSN |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.3 | 53.5 | BSN UNet |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.4 | 45 | BSN UNet |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.5 | 36.9 | BSN UNet |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.6 | 28.4 | BSN UNet |
| Temporal Action Localization | THUMOS’14 | mAP IOU@0.7 | 20 | BSN UNet |
| Temporal Action Localization | THUMOS' 14 | AR@100 | 46.06 | BSN + Soft-NMS |
| Temporal Action Localization | THUMOS' 14 | AR@1000 | 64.52 | BSN + Soft-NMS |
| Temporal Action Localization | THUMOS' 14 | AR@200 | 53.21 | BSN + Soft-NMS |
| Temporal Action Localization | THUMOS' 14 | AR@50 | 37.46 | BSN + Soft-NMS |
| Temporal Action Localization | THUMOS' 14 | AR@500 | 60.64 | BSN + Soft-NMS |
| Temporal Action Localization | ActivityNet-1.3 | AR@100 | 74.16 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | AUC (test) | 66.26 | BSN |
| Temporal Action Localization | ActivityNet-1.3 | AUC (val) | 66.17 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP | 30.03 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.5 | 46.45 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.75 | 29.96 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | mAP IOU@0.95 | 8.02 | BSN |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.3 | 53.5 | BSN UNet |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.4 | 45 | BSN UNet |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.5 | 36.9 | BSN UNet |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.6 | 28.4 | BSN UNet |
| Zero-Shot Learning | THUMOS’14 | mAP IOU@0.7 | 20 | BSN UNet |
| Zero-Shot Learning | THUMOS' 14 | AR@100 | 46.06 | BSN + Soft-NMS |
| Zero-Shot Learning | THUMOS' 14 | AR@1000 | 64.52 | BSN + Soft-NMS |
| Zero-Shot Learning | THUMOS' 14 | AR@200 | 53.21 | BSN + Soft-NMS |
| Zero-Shot Learning | THUMOS' 14 | AR@50 | 37.46 | BSN + Soft-NMS |
| Zero-Shot Learning | THUMOS' 14 | AR@500 | 60.64 | BSN + Soft-NMS |
| Zero-Shot Learning | ActivityNet-1.3 | AR@100 | 74.16 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | AUC (test) | 66.26 | BSN |
| Zero-Shot Learning | ActivityNet-1.3 | AUC (val) | 66.17 | BSN |
| Activity Recognition | THUMOS’14 | mAP@0.3 | 53.5 | BSN |
| Activity Recognition | THUMOS’14 | mAP@0.4 | 45 | BSN |
| Activity Recognition | THUMOS’14 | mAP@0.5 | 36.9 | BSN |
| Action Localization | ActivityNet-1.3 | mAP | 30.03 | BSN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.5 | 46.45 | BSN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.75 | 29.96 | BSN |
| Action Localization | ActivityNet-1.3 | mAP IOU@0.95 | 8.02 | BSN |
| Action Localization | THUMOS’14 | mAP IOU@0.3 | 53.5 | BSN UNet |
| Action Localization | THUMOS’14 | mAP IOU@0.4 | 45 | BSN UNet |
| Action Localization | THUMOS’14 | mAP IOU@0.5 | 36.9 | BSN UNet |
| Action Localization | THUMOS’14 | mAP IOU@0.6 | 28.4 | BSN UNet |
| Action Localization | THUMOS’14 | mAP IOU@0.7 | 20 | BSN UNet |
| Action Localization | THUMOS' 14 | AR@100 | 46.06 | BSN + Soft-NMS |
| Action Localization | THUMOS' 14 | AR@1000 | 64.52 | BSN + Soft-NMS |
| Action Localization | THUMOS' 14 | AR@200 | 53.21 | BSN + Soft-NMS |
| Action Localization | THUMOS' 14 | AR@50 | 37.46 | BSN + Soft-NMS |
| Action Localization | THUMOS' 14 | AR@500 | 60.64 | BSN + Soft-NMS |
| Action Localization | ActivityNet-1.3 | AR@100 | 74.16 | BSN |
| Action Localization | ActivityNet-1.3 | AUC (test) | 66.26 | BSN |
| Action Localization | ActivityNet-1.3 | AUC (val) | 66.17 | BSN |
| Action Recognition | THUMOS’14 | mAP@0.3 | 53.5 | BSN |
| Action Recognition | THUMOS’14 | mAP@0.4 | 45 | BSN |
| Action Recognition | THUMOS’14 | mAP@0.5 | 36.9 | BSN |