TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Learning Latent Sub-events in Activity Videos Using Tempor...

Learning Latent Sub-events in Activity Videos Using Temporal Attention Filters

AJ Piergiovanni, Chenyou Fan, Michael S. Ryoo

2016-05-26Activity Recognition In VideosAction ClassificationHuman Activity RecognitionAction Recognition In VideosActivity Recognition
PaperPDFCode

Abstract

In this paper, we newly introduce the concept of temporal attention filters, and describe how they can be used for human activity recognition from videos. Many high-level activities are often composed of multiple temporal parts (e.g., sub-events) with different duration/speed, and our objective is to make the model explicitly learn such temporal structure using multiple attention filters and benefit from them. Our temporal filters are designed to be fully differentiable, allowing end-of-end training of the temporal filters together with the underlying frame-based or segment-based convolutional neural network architectures. This paper presents an approach of learning a set of optimal static temporal attention filters to be shared across different videos, and extends this approach to dynamically adjust attention filters per testing video using recurrent long short-term memory networks (LSTMs). This allows our temporal attention filters to learn latent sub-events specific to each activity. We experimentally confirm that the proposed concept of temporal attention filters benefits the activity recognition, and we visualize the learned latent sub-events.

Results

TaskDatasetMetricValueModel
VideoDogCentricAccuracy98.55VTFSA
Temporal Action LocalizationDogCentricAccuracy98.55VTFSA
Zero-Shot LearningDogCentricAccuracy98.55VTFSA
Action LocalizationDogCentricAccuracy98.55VTFSA
Activity Recognition In VideosDogCentricAccuracy98.55VTFSA

Related Papers

ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis2025-06-17DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16MORIC: CSI Delay-Doppler Decomposition for Robust Wi-Fi-based Human Activity Recognition2025-06-15AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments2025-06-13ScalableHD: Scalable and High-Throughput Hyperdimensional Computing Inference on Multi-Core CPUs2025-06-10SurgBench: A Unified Large-Scale Benchmark for Surgical Video Analysis2025-06-09