TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/BMN: Boundary-Matching Network for Temporal Action Proposa...

BMN: Boundary-Matching Network for Temporal Action Proposal Generation

Tianwei Lin, Xiao Liu, Xin Li, Errui Ding, Shilei Wen

2019-07-23ICCV 2019 10Action DetectionTemporal Action Proposal GenerationAction RecognitionTemporal Action Localization
PaperPDFCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCodeCode(official)Code

Abstract

Temporal action proposal generation is an challenging and promising task which aims to locate temporal regions in real-world videos where action or event may occur. Current bottom-up proposal generation methods can generate proposals with precise boundary, but cannot efficiently generate adequately reliable confidence scores for retrieving proposals. To address these difficulties, we introduce the Boundary-Matching (BM) mechanism to evaluate confidence scores of densely distributed proposals, which denote a proposal as a matching pair of starting and ending boundaries and combine all densely distributed BM pairs into the BM confidence map. Based on BM mechanism, we propose an effective, efficient and end-to-end proposal generation method, named Boundary-Matching Network (BMN), which generates proposals with precise temporal boundaries as well as reliable confidence scores simultaneously. The two-branches of BMN are jointly trained in an unified framework. We conduct experiments on two challenging datasets: THUMOS-14 and ActivityNet-1.3, where BMN shows significant performance improvement with remarkable efficiency and generalizability. Further, combining with existing action classifier, BMN can achieve state-of-the-art temporal action detection performance.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3mAP33.85BMN
VideoActivityNet-1.3mAP IOU@0.550.07BMN
VideoActivityNet-1.3mAP IOU@0.7534.78BMN
VideoActivityNet-1.3mAP IOU@0.958.29BMN
VideoFineActionmAP9.25BMN (i3d feaure)
VideoFineActionmAP IOU@0.514.44BMN (i3d feaure)
VideoFineActionmAP IOU@0.758.92BMN (i3d feaure)
VideoFineActionmAP IOU@0.953.12BMN (i3d feaure)
VideoTHUMOS’14mAP IOU@0.532.2BMN
VideoEPIC-KITCHENS-100Avg mAP (0.1-0.5)8.4BMN (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.110.8BMN (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.29.8BMN (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.38.4BMN (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.47.1BMN (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.55.6BMN (verb)
VideoActivityNet-1.3AR@10075.01BMN
VideoActivityNet-1.3AUC (val)67.1BMN
Temporal Action LocalizationActivityNet-1.3mAP33.85BMN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.550.07BMN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.7534.78BMN
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.958.29BMN
Temporal Action LocalizationFineActionmAP9.25BMN (i3d feaure)
Temporal Action LocalizationFineActionmAP IOU@0.514.44BMN (i3d feaure)
Temporal Action LocalizationFineActionmAP IOU@0.758.92BMN (i3d feaure)
Temporal Action LocalizationFineActionmAP IOU@0.953.12BMN (i3d feaure)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.532.2BMN
Temporal Action LocalizationEPIC-KITCHENS-100Avg mAP (0.1-0.5)8.4BMN (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.110.8BMN (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.29.8BMN (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.38.4BMN (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.47.1BMN (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.55.6BMN (verb)
Temporal Action LocalizationActivityNet-1.3AR@10075.01BMN
Temporal Action LocalizationActivityNet-1.3AUC (val)67.1BMN
Zero-Shot LearningActivityNet-1.3mAP33.85BMN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.550.07BMN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.7534.78BMN
Zero-Shot LearningActivityNet-1.3mAP IOU@0.958.29BMN
Zero-Shot LearningFineActionmAP9.25BMN (i3d feaure)
Zero-Shot LearningFineActionmAP IOU@0.514.44BMN (i3d feaure)
Zero-Shot LearningFineActionmAP IOU@0.758.92BMN (i3d feaure)
Zero-Shot LearningFineActionmAP IOU@0.953.12BMN (i3d feaure)
Zero-Shot LearningTHUMOS’14mAP IOU@0.532.2BMN
Zero-Shot LearningEPIC-KITCHENS-100Avg mAP (0.1-0.5)8.4BMN (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.110.8BMN (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.29.8BMN (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.38.4BMN (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.47.1BMN (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.55.6BMN (verb)
Zero-Shot LearningActivityNet-1.3AR@10075.01BMN
Zero-Shot LearningActivityNet-1.3AUC (val)67.1BMN
Activity RecognitionTHUMOS’14mAP@0.356BMN
Activity RecognitionTHUMOS’14mAP@0.447.4BMN
Activity RecognitionTHUMOS’14mAP@0.538.8BMN
Action LocalizationActivityNet-1.3mAP33.85BMN
Action LocalizationActivityNet-1.3mAP IOU@0.550.07BMN
Action LocalizationActivityNet-1.3mAP IOU@0.7534.78BMN
Action LocalizationActivityNet-1.3mAP IOU@0.958.29BMN
Action LocalizationFineActionmAP9.25BMN (i3d feaure)
Action LocalizationFineActionmAP IOU@0.514.44BMN (i3d feaure)
Action LocalizationFineActionmAP IOU@0.758.92BMN (i3d feaure)
Action LocalizationFineActionmAP IOU@0.953.12BMN (i3d feaure)
Action LocalizationTHUMOS’14mAP IOU@0.532.2BMN
Action LocalizationEPIC-KITCHENS-100Avg mAP (0.1-0.5)8.4BMN (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.110.8BMN (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.29.8BMN (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.38.4BMN (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.47.1BMN (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.55.6BMN (verb)
Action LocalizationActivityNet-1.3AR@10075.01BMN
Action LocalizationActivityNet-1.3AUC (val)67.1BMN
Action RecognitionTHUMOS’14mAP@0.356BMN
Action RecognitionTHUMOS’14mAP@0.447.4BMN
Action RecognitionTHUMOS’14mAP@0.538.8BMN

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25