TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Bridge-Prompt: Towards Ordinal Action Understanding in Ins...

Bridge-Prompt: Towards Ordinal Action Understanding in Instructional Videos

Muheng Li, Lei Chen, Yueqi Duan, Zhilan Hu, Jianjiang Feng, Jie zhou, Jiwen Lu

2022-03-26CVPR 2022 1Action SegmentationHuman Activity RecognitionActivity RecognitionAction Understanding
PaperPDFCode(official)

Abstract

Action recognition models have shown a promising capability to classify human actions in short video clips. In a real scenario, multiple correlated human actions commonly occur in particular orders, forming semantically meaningful human activities. Conventional action recognition approaches focus on analyzing single actions. However, they fail to fully reason about the contextual relations between adjacent actions, which provide potential temporal logic for understanding long videos. In this paper, we propose a prompt-based framework, Bridge-Prompt (Br-Prompt), to model the semantics across adjacent actions, so that it simultaneously exploits both out-of-context and contextual information from a series of ordinal actions in instructional videos. More specifically, we reformulate the individual action labels as integrated text prompts for supervision, which bridge the gap between individual action semantics. The generated text prompts are paired with corresponding video clips, and together co-train the text encoder and the video encoder via a contrastive approach. The learned vision encoder has a stronger capability for ordinal-action-related downstream tasks, e.g. action segmentation and human activity recognition. We evaluate the performances of our approach on several video datasets: Georgia Tech Egocentric Activities (GTEA), 50Salads, and the Breakfast dataset. Br-Prompt achieves state-of-the-art on multiple benchmarks. Code is available at https://github.com/ttlmh/Bridge-Prompt

Results

TaskDatasetMetricValueModel
Action Localization50 SaladsAcc88.1Br-Prompt+ASFormer
Action Localization50 SaladsEdit83.8Br-Prompt+ASFormer
Action Localization50 SaladsF1@10%89.2Br-Prompt+ASFormer
Action Localization50 SaladsF1@25%87.8Br-Prompt+ASFormer
Action Localization50 SaladsF1@50%81.3Br-Prompt+ASFormer
Action LocalizationGTEAAcc81.2Br-Prompt+ASFormer
Action LocalizationGTEAEdit91.6Br-Prompt+ASFormer
Action LocalizationGTEAF1@10%94.1Br-Prompt+ASFormer
Action LocalizationGTEAF1@25%92Br-Prompt+ASFormer
Action LocalizationGTEAF1@50%83Br-Prompt+ASFormer
Action Segmentation50 SaladsAcc88.1Br-Prompt+ASFormer
Action Segmentation50 SaladsEdit83.8Br-Prompt+ASFormer
Action Segmentation50 SaladsF1@10%89.2Br-Prompt+ASFormer
Action Segmentation50 SaladsF1@25%87.8Br-Prompt+ASFormer
Action Segmentation50 SaladsF1@50%81.3Br-Prompt+ASFormer
Action SegmentationGTEAAcc81.2Br-Prompt+ASFormer
Action SegmentationGTEAEdit91.6Br-Prompt+ASFormer
Action SegmentationGTEAF1@10%94.1Br-Prompt+ASFormer
Action SegmentationGTEAF1@25%92Br-Prompt+ASFormer
Action SegmentationGTEAF1@50%83Br-Prompt+ASFormer

Related Papers

ZKP-FedEval: Verifiable and Privacy-Preserving Federated Evaluation using Zero-Knowledge Proofs2025-07-15Self-supervised pretraining of vision transformers for animal behavioral analysis and neural encoding2025-07-13LLaVA-Pose: Enhancing Human Pose and Action Understanding via Keypoint-Integrated Instruction Tuning2025-06-26SEZ-HARN: Self-Explainable Zero-shot Human Activity Recognition Network2025-06-25Efficient Retail Video Annotation: A Robust Key Frame Generation Approach for Product and Customer Interaction Analysis2025-06-17DeSPITE: Exploring Contrastive Deep Skeleton-Pointcloud-IMU-Text Embeddings for Advanced Point Cloud Human Activity Understanding2025-06-16MORIC: CSI Delay-Doppler Decomposition for Robust Wi-Fi-based Human Activity Recognition2025-06-15AgentSense: Virtual Sensor Data Generation Using LLM Agents in Simulated Home Environments2025-06-13