TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Group Contextualization for Video Recognition

Group Contextualization for Video Recognition

Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Xiangnan He

2022-03-18CVPR 2022 1Video RecognitionEgocentric Activity RecognitionAction Recognition
PaperPDFCode(official)

Abstract

Learning discriminative representation from the complex spatio-temporal dynamic space is essential for video recognition. On top of those stylized spatio-temporal computational units, further refining the learnt feature with axial contexts is demonstrated to be promising in achieving this goal. However, previous works generally focus on utilizing a single kind of contexts to calibrate entire feature channels and could hardly apply to deal with diverse video activities. The problem can be tackled by using pair-wise spatio-temporal attentions to recompute feature response with cross-axis contexts at the expense of heavy computations. In this paper, we propose an efficient feature refinement method that decomposes the feature channels into several groups and separately refines them with different axial contexts in parallel. We refer this lightweight feature calibration as group contextualization (GC). Specifically, we design a family of efficient element-wise calibrators, i.e., ECal-G/S/T/L, where their axial contexts are information dynamics aggregated from other axes either globally or locally, to contextualize feature channel groups. The GC module can be densely plugged into each residual layer of the off-the-shelf video networks. With little computational overhead, consistent improvement is observed when plugging in GC on different networks. By utilizing calibrators to embed feature with four different kinds of contexts in parallel, the learnt representation is expected to be more resilient to diverse types of activities. On videos with rich temporal variations, empirically GC can boost the performance of 2D-CNN (e.g., TSN and TSM) to a level comparable to the state-of-the-art video networks. Code is available at https://github.com/haoyanbin918/Group-Contextualization.

Results

TaskDatasetMetricValueModel
Activity RecognitionDiving-48Accuracy87.6GC-TDN
Activity RecognitionSomething-Something V2GFLOPs110.1GC-TDN Ensemble (R50,8+16)
Activity RecognitionSomething-Something V2Parameters27.4GC-TDN Ensemble (R50,8+16)
Activity RecognitionSomething-Something V2Top-1 Accuracy67.8GC-TDN Ensemble (R50,8+16)
Activity RecognitionSomething-Something V2Top-5 Accuracy91.2GC-TDN Ensemble (R50,8+16)
Activity RecognitionEGTEAAverage Accuracy65.1GC-TSM
Action RecognitionDiving-48Accuracy87.6GC-TDN
Action RecognitionSomething-Something V2GFLOPs110.1GC-TDN Ensemble (R50,8+16)
Action RecognitionSomething-Something V2Parameters27.4GC-TDN Ensemble (R50,8+16)
Action RecognitionSomething-Something V2Top-1 Accuracy67.8GC-TDN Ensemble (R50,8+16)
Action RecognitionSomething-Something V2Top-5 Accuracy91.2GC-TDN Ensemble (R50,8+16)

Related Papers

A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16Zero-shot Skeleton-based Action Recognition with Prototype-guided Feature Alignment2025-07-01EgoAdapt: Adaptive Multisensory Distillation and Policy Learning for Efficient Egocentric Perception2025-06-26Feature Hallucination for Self-supervised Action Recognition2025-06-25CARMA: Context-Aware Situational Grounding of Human-Robot Group Interactions by Combining Vision-Language Models with Object and Action Recognition2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Adapting Vision-Language Models for Evaluating World Models2025-06-22