TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Semantically Guided Representation Learning For Action Ant...

Semantically Guided Representation Learning For Action Anticipation

Anxhelo Diko, Danilo Avola, Bardh Prenkaj, Federico Fontana, Luigi Cinque

2024-07-02Representation LearningAction Anticipation
PaperPDFCode(official)

Abstract

Action anticipation is the task of forecasting future activity from a partially observed sequence of events. However, this task is exposed to intrinsic future uncertainty and the difficulty of reasoning upon interconnected actions. Unlike previous works that focus on extrapolating better visual and temporal information, we concentrate on learning action representations that are aware of their semantic interconnectivity based on prototypical action patterns and contextual co-occurrences. To this end, we propose the novel Semantically Guided Representation Learning (S-GEAR) framework. S-GEAR learns visual action prototypes and leverages language models to structure their relationship, inducing semanticity. To gather insights on S-GEAR's effectiveness, we test it on four action anticipation benchmarks, obtaining improved results compared to previous works: +3.5, +2.7, and +3.5 absolute points on Top-1 Accuracy on Epic-Kitchen 55, EGTEA Gaze+ and 50 Salads, respectively, and +0.8 on Top-5 Recall on Epic-Kitchens 100. We further observe that S-GEAR effectively transfers the geometric associations between actions from language to visual prototypes. Finally, S-GEAR opens new research frontiers in anticipation tasks by demonstrating the intricate impact of action semantic interconnectivity.

Results

TaskDatasetMetricValueModel
Activity RecognitionEPIC-KITCHENS-100Recall@519.9S-GEAR
Action RecognitionEPIC-KITCHENS-100Recall@519.9S-GEAR
Action AnticipationEPIC-KITCHENS-100Recall@519.9S-GEAR
2D Human Pose EstimationEPIC-KITCHENS-100Recall@519.9S-GEAR
Action Recognition In VideosEPIC-KITCHENS-100Recall@519.9S-GEAR

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16Are encoders able to learn landmarkers for warm-starting of Hyperparameter Optimization?2025-07-16Language-Guided Contrastive Audio-Visual Masked Autoencoder with Automatically Generated Audio-Visual-Text Triplets from Videos2025-07-16A Mixed-Primitive-based Gaussian Splatting Method for Surface Reconstruction2025-07-15Dual Dimensions Geometric Representation Learning Based Document Dewarping2025-07-11