TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AOE-Net: Entities Interactions Modeling with Adaptive Atte...

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

Khoa Vo, Sang Truong, Kashu Yamazaki, Bhiksha Raj, Minh-Triet Tran, Ngan Le

2022-10-05Action DetectionTemporal Action Proposal Generation
PaperPDFCode(official)

Abstract

Temporal action proposal generation (TAPG) is a challenging task, which requires localizing action intervals in an untrimmed video. Intuitively, we as humans, perceive an action through the interactions between actors, relevant objects, and the surrounding environment. Despite the significant progress of TAPG, a vast majority of existing methods ignore the aforementioned principle of the human perceiving process by applying a backbone network into a given video as a black-box. In this paper, we propose to model these interactions with a multi-modal representation network, namely, Actors-Objects-Environment Interaction Network (AOE-Net). Our AOE-Net consists of two modules, i.e., perception-based multi-modal representation (PMR) and boundary-matching module (BMM). Additionally, we introduce adaptive attention mechanism (AAM) in PMR to focus only on main actors (or relevant objects) and model the relationships among them. PMR module represents each video snippet by a visual-linguistic feature, in which main actors and surrounding environment are represented by visual information, whereas relevant objects are depicted by linguistic features through an image-text model. BMM module processes the sequence of visual-linguistic features as its input and generates action proposals. Comprehensive experiments and extensive ablation studies on ActivityNet-1.3 and THUMOS-14 datasets show that our proposed AOE-Net outperforms previous state-of-the-art methods with remarkable performance and generalization for both TAPG and temporal action detection. To prove the robustness and effectiveness of AOE-Net, we further conduct an ablation study on egocentric videos, i.e. EPIC-KITCHENS 100 dataset. Source code is available upon acceptance.

Results

TaskDatasetMetricValueModel
VideoTHUMOS' 14AR@10050.26AOE-Net + Soft-NMS
VideoTHUMOS' 14AR@100068.19AOE-Net + Soft-NMS
VideoTHUMOS' 14AR@20057.3AOE-Net + Soft-NMS
VideoTHUMOS' 14AR@5044.56AOE-Net + Soft-NMS
VideoTHUMOS' 14AR@50064.32AOE-Net + Soft-NMS
VideoActivityNet-1.3AR@10077.67AOE-Net
VideoActivityNet-1.3AUC (test)70.1AOE-Net
VideoActivityNet-1.3AUC (val)69.71AOE-Net
Temporal Action LocalizationTHUMOS' 14AR@10050.26AOE-Net + Soft-NMS
Temporal Action LocalizationTHUMOS' 14AR@100068.19AOE-Net + Soft-NMS
Temporal Action LocalizationTHUMOS' 14AR@20057.3AOE-Net + Soft-NMS
Temporal Action LocalizationTHUMOS' 14AR@5044.56AOE-Net + Soft-NMS
Temporal Action LocalizationTHUMOS' 14AR@50064.32AOE-Net + Soft-NMS
Temporal Action LocalizationActivityNet-1.3AR@10077.67AOE-Net
Temporal Action LocalizationActivityNet-1.3AUC (test)70.1AOE-Net
Temporal Action LocalizationActivityNet-1.3AUC (val)69.71AOE-Net
Zero-Shot LearningTHUMOS' 14AR@10050.26AOE-Net + Soft-NMS
Zero-Shot LearningTHUMOS' 14AR@100068.19AOE-Net + Soft-NMS
Zero-Shot LearningTHUMOS' 14AR@20057.3AOE-Net + Soft-NMS
Zero-Shot LearningTHUMOS' 14AR@5044.56AOE-Net + Soft-NMS
Zero-Shot LearningTHUMOS' 14AR@50064.32AOE-Net + Soft-NMS
Zero-Shot LearningActivityNet-1.3AR@10077.67AOE-Net
Zero-Shot LearningActivityNet-1.3AUC (test)70.1AOE-Net
Zero-Shot LearningActivityNet-1.3AUC (val)69.71AOE-Net
Action LocalizationTHUMOS' 14AR@10050.26AOE-Net + Soft-NMS
Action LocalizationTHUMOS' 14AR@100068.19AOE-Net + Soft-NMS
Action LocalizationTHUMOS' 14AR@20057.3AOE-Net + Soft-NMS
Action LocalizationTHUMOS' 14AR@5044.56AOE-Net + Soft-NMS
Action LocalizationTHUMOS' 14AR@50064.32AOE-Net + Soft-NMS
Action LocalizationActivityNet-1.3AR@10077.67AOE-Net
Action LocalizationActivityNet-1.3AUC (test)70.1AOE-Net
Action LocalizationActivityNet-1.3AUC (val)69.71AOE-Net

Related Papers

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29Robust Activity Detection for Massive Random Access2025-05-21