TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal Aggregate Representations for Long-Range Video Un...

Temporal Aggregate Representations for Long-Range Video Understanding

Fadime Sener, Dipika Singhania, Angela Yao

2020-06-01ECCV 2020 8Action AnticipationFuture predictionVideo SegmentationVideo Semantic SegmentationVideo UnderstandingAction Recognition
PaperPDFCode(official)Code

Abstract

Future prediction, especially in long-range videos, requires reasoning from current and past observations. In this work, we address questions of temporal extent, scaling, and level of semantic abstraction with a flexible multi-granular temporal aggregation framework. We show that it is possible to achieve state of the art in both next action and dense anticipation with simple techniques such as max-pooling and attention. To demonstrate the anticipation capabilities of our model, we conduct experiments on Breakfast, 50Salads, and EPIC-Kitchens datasets, where we achieve state-of-the-art results. With minimal modifications, our model can also be extended for video segmentation and action recognition.

Results

TaskDatasetMetricValueModel
Activity RecognitionAssembly101Actions Recall@58.53TempAgg
Activity RecognitionAssembly101Objects Recall@526.27TempAgg
Activity RecognitionAssembly101Verbs Recall@559.11TempAgg
Action RecognitionAssembly101Actions Recall@58.53TempAgg
Action RecognitionAssembly101Objects Recall@526.27TempAgg
Action RecognitionAssembly101Verbs Recall@559.11TempAgg
Action AnticipationAssembly101Actions Recall@58.53TempAgg
Action AnticipationAssembly101Objects Recall@526.27TempAgg
Action AnticipationAssembly101Verbs Recall@559.11TempAgg
2D Human Pose EstimationAssembly101Actions Recall@58.53TempAgg
2D Human Pose EstimationAssembly101Objects Recall@526.27TempAgg
2D Human Pose EstimationAssembly101Verbs Recall@559.11TempAgg
Action Recognition In VideosAssembly101Actions Recall@58.53TempAgg
Action Recognition In VideosAssembly101Objects Recall@526.27TempAgg
Action Recognition In VideosAssembly101Verbs Recall@559.11TempAgg

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21VideoITG: Multimodal Video Understanding with Instructed Temporal Grounding2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks2025-07-15EmbRACE-3K: Embodied Reasoning and Action in Complex Environments2025-07-14Chat with AI: The Surprising Turn of Real-time Video Communication from Human to AI2025-07-14Memory-Augmented SAM2 for Training-Free Surgical Video Segmentation2025-07-13MUVOD: A Novel Multi-view Video Object Segmentation Dataset and A Benchmark for 3D Segmentation2025-07-10