TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Technical Report: Temporal Aggregate Representations

Technical Report: Temporal Aggregate Representations

Fadime Sener, Dibyadip Chatterjee, Angela Yao

2021-06-06Action AnticipationVideo UnderstandingAction Recognition
PaperPDFCode(official)

Abstract

This technical report extends our work presented in [9] with more experiments. In [9], we tackle long-term video understanding, which requires reasoning from current and past or future observations and raises several fundamental questions. How should temporal or sequential relationships be modelled? What temporal extent of information and context needs to be processed? At what temporal scale should they be derived? [9] addresses these questions with a flexible multi-granular temporal aggregation framework. In this report, we conduct further experiments with this framework on different tasks and a new dataset, EPIC-KITCHENS-100.

Results

TaskDatasetMetricValueModel
Activity RecognitionEPIC-KITCHENS-100Action@145.26TempAgg
Activity RecognitionEPIC-KITCHENS-100Noun@153.35TempAgg
Activity RecognitionEPIC-KITCHENS-100Verb@166TempAgg
Activity RecognitionEPIC-KITCHENS-100 (test)recall@512.6TempAgg
Activity RecognitionEPIC-KITCHENS-100Recall@514.73TempAgg
Action RecognitionEPIC-KITCHENS-100Action@145.26TempAgg
Action RecognitionEPIC-KITCHENS-100Noun@153.35TempAgg
Action RecognitionEPIC-KITCHENS-100Verb@166TempAgg
Action RecognitionEPIC-KITCHENS-100 (test)recall@512.6TempAgg
Action RecognitionEPIC-KITCHENS-100Recall@514.73TempAgg
Action AnticipationEPIC-KITCHENS-100 (test)recall@512.6TempAgg
Action AnticipationEPIC-KITCHENS-100Recall@514.73TempAgg
2D Human Pose EstimationEPIC-KITCHENS-100 (test)recall@512.6TempAgg
2D Human Pose EstimationEPIC-KITCHENS-100Recall@514.73TempAgg
Action Recognition In VideosEPIC-KITCHENS-100 (test)recall@512.6TempAgg
Action Recognition In VideosEPIC-KITCHENS-100Recall@514.73TempAgg

Related Papers

VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14Beyond Appearance: Geometric Cues for Robust Video Instance Segmentation2025-07-08Omni-Video: Democratizing Unified Video Understanding and Generation2025-07-08MCAM: Multimodal Causal Analysis Model for Ego-Vehicle-Level Driving Video Understanding2025-07-08