TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Multi-granularity Generator for Temporal Action Proposal

Multi-granularity Generator for Temporal Action Proposal

Yuan Liu, Lin Ma, Yifeng Zhang, Wei Liu, Shih-Fu Chang

2018-11-28CVPR 2019 6Temporal Action Proposal GenerationAction Recognition
PaperPDF

Abstract

Temporal action proposal generation is an important task, aiming to localize the video segments containing human actions in an untrimmed video. In this paper, we propose a multi-granularity generator (MGG) to perform the temporal action proposal from different granularity perspectives, relying on the video visual features equipped with the position embedding information. First, we propose to use a bilinear matching model to exploit the rich local information within the video sequence. Afterwards, two components, namely segment proposal producer (SPP) and frame actionness producer (FAP), are combined to perform the task of temporal action proposal at two distinct granularities. SPP considers the whole video in the form of feature pyramid and generates segment proposals from one coarse perspective, while FAP carries out a finer actionness evaluation for each video frame. Our proposed MGG can be trained in an end-to-end fashion. By temporally adjusting the segment proposals with fine-grained frame actionness information, MGG achieves the superior performance over state-of-the-art methods on the public THUMOS-14 and ActivityNet-1.3 datasets. Moreover, we employ existing action classifiers to perform the classification of the proposals generated by MGG, leading to significant improvements compared against the competing methods for the video detection task.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3AR@10074.54MGG
VideoActivityNet-1.3AUC (val)66.43MGG
Temporal Action LocalizationActivityNet-1.3AR@10074.54MGG
Temporal Action LocalizationActivityNet-1.3AUC (val)66.43MGG
Zero-Shot LearningActivityNet-1.3AR@10074.54MGG
Zero-Shot LearningActivityNet-1.3AUC (val)66.43MGG
Activity RecognitionTHUMOS’14mAP@0.353.9MGG UNet
Activity RecognitionTHUMOS’14mAP@0.446.8MGG UNet
Activity RecognitionTHUMOS’14mAP@0.537.4MGG UNet
Action LocalizationActivityNet-1.3AR@10074.54MGG
Action LocalizationActivityNet-1.3AUC (val)66.43MGG
Action RecognitionTHUMOS’14mAP@0.353.9MGG UNet
Action RecognitionTHUMOS’14mAP@0.446.8MGG UNet
Action RecognitionTHUMOS’14mAP@0.537.4MGG UNet

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22Active Multimodal Distillation for Few-shot Action Recognition2025-06-16