TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/AEI: Actors-Environment Interaction with Adaptive Attentio...

AEI: Actors-Environment Interaction with Adaptive Attention for Temporal Action Proposals Generation

Khoa Vo, Hyekang Joo, Kashu Yamazaki, Sang Truong, Kris Kitani, Minh-Triet Tran, Ngan Le

2021-10-21Action DetectionTemporal Action Proposal Generation
PaperPDFCode(official)

Abstract

Humans typically perceive the establishment of an action in a video through the interaction between an actor and the surrounding environment. An action only starts when the main actor in the video begins to interact with the environment, while it ends when the main actor stops the interaction. Despite the great progress in temporal action proposal generation, most existing works ignore the aforementioned fact and leave their model learning to propose actions as a black-box. In this paper, we make an attempt to simulate that ability of a human by proposing Actor Environment Interaction (AEI) network to improve the video representation for temporal action proposals generation. AEI contains two modules, i.e., perception-based visual representation (PVR) and boundary-matching module (BMM). PVR represents each video snippet by taking human-human relations and humans-environment relations into consideration using the proposed adaptive attention mechanism. Then, the video representation is taken by BMM to generate action proposals. AEI is comprehensively evaluated in ActivityNet-1.3 and THUMOS-14 datasets, on temporal action proposal and detection tasks, with two boundary-matching architectures (i.e., CNN-based and GCN-based) and two classifiers (i.e., Unet and P-GCN). Our AEI robustly outperforms the state-of-the-art methods with remarkable performance and generalization for both temporal action proposal generation and temporal action detection.

Results

TaskDatasetMetricValueModel
VideoActivityNet-1.3AR@10077.24AEI-G
VideoActivityNet-1.3AUC (test)70.09AEI-G
VideoActivityNet-1.3AUC (val)69.47AEI-G
Temporal Action LocalizationActivityNet-1.3AR@10077.24AEI-G
Temporal Action LocalizationActivityNet-1.3AUC (test)70.09AEI-G
Temporal Action LocalizationActivityNet-1.3AUC (val)69.47AEI-G
Zero-Shot LearningActivityNet-1.3AR@10077.24AEI-G
Zero-Shot LearningActivityNet-1.3AUC (test)70.09AEI-G
Zero-Shot LearningActivityNet-1.3AUC (val)69.47AEI-G
Action LocalizationActivityNet-1.3AR@10077.24AEI-G
Action LocalizationActivityNet-1.3AUC (test)70.09AEI-G
Action LocalizationActivityNet-1.3AUC (val)69.47AEI-G

Related Papers

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29Robust Activity Detection for Massive Random Access2025-05-21