TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/TriDet: Temporal Action Detection with Relative Boundary M...

TriDet: Temporal Action Detection with Relative Boundary Modeling

Dingfeng Shi, Yujie Zhong, Qiong Cao, Lin Ma, Jia Li, DaCheng Tao

2023-03-13CVPR 2023 1Action DetectionTemporal Action Localization
PaperPDFCode(official)

Abstract

In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous action boundaries in videos. To alleviate this problem, we propose a novel Trident-head to model the action boundary via an estimated relative probability distribution around the boundary. In the feature pyramid of TriDet, we propose an efficient Scalable-Granularity Perception (SGP) layer to mitigate the rank loss problem of self-attention that takes place in the video features and aggregate information across different temporal granularities. Benefiting from the Trident-head and the SGP-based feature pyramid, TriDet achieves state-of-the-art performance on three challenging benchmarks: THUMOS14, HACS and EPIC-KITCHEN 100, with lower computational costs, compared to previous methods. For example, TriDet hits an average mAP of $69.3\%$ on THUMOS14, outperforming the previous best by $2.5\%$, but with only $74.6\%$ of its latency. The code is released to https://github.com/sssste/TriDet.

Results

TaskDatasetMetricValueModel
VideoHACSAverage-mAP38.6TriDet (SlowFast)
VideoHACSmAP@0.556.7TriDet (SlowFast)
VideoHACSmAP@0.7539.3TriDet (SlowFast)
VideoHACSmAP@0.9511.7TriDet (SlowFast)
VideoHACSAverage-mAP36.8TriDet (I3D RGB)
VideoHACSmAP@0.554.5TriDet (I3D RGB)
VideoHACSmAP@0.7536.8TriDet (I3D RGB)
VideoHACSmAP@0.9511.5TriDet (I3D RGB)
VideoActivityNet-1.3mAP36.8TriDet (TSP features)
VideoActivityNet-1.3mAP IOU@0.554.7TriDet (TSP features)
VideoActivityNet-1.3mAP IOU@0.7538TriDet (TSP features)
VideoActivityNet-1.3mAP IOU@0.958.4TriDet (TSP features)
VideoTHUMOS’14Avg mAP (0.3:0.7)69.3TriDet (I3D features)
VideoTHUMOS’14mAP IOU@0.383.6TriDet (I3D features)
VideoTHUMOS’14mAP IOU@0.480.1TriDet (I3D features)
VideoTHUMOS’14mAP IOU@0.572.9TriDet (I3D features)
VideoTHUMOS’14mAP IOU@0.662.4TriDet (I3D features)
VideoTHUMOS’14mAP IOU@0.747.4TriDet (I3D features)
VideoEPIC-KITCHENS-100Avg mAP (0.1-0.5)25.4TriDet (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.128.6TriDet (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.227.4TriDet (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.326.1TriDet (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.424.2TriDet (verb)
VideoEPIC-KITCHENS-100mAP IOU@0.520.8TriDet (verb)
Temporal Action LocalizationHACSAverage-mAP38.6TriDet (SlowFast)
Temporal Action LocalizationHACSmAP@0.556.7TriDet (SlowFast)
Temporal Action LocalizationHACSmAP@0.7539.3TriDet (SlowFast)
Temporal Action LocalizationHACSmAP@0.9511.7TriDet (SlowFast)
Temporal Action LocalizationHACSAverage-mAP36.8TriDet (I3D RGB)
Temporal Action LocalizationHACSmAP@0.554.5TriDet (I3D RGB)
Temporal Action LocalizationHACSmAP@0.7536.8TriDet (I3D RGB)
Temporal Action LocalizationHACSmAP@0.9511.5TriDet (I3D RGB)
Temporal Action LocalizationActivityNet-1.3mAP36.8TriDet (TSP features)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.554.7TriDet (TSP features)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.7538TriDet (TSP features)
Temporal Action LocalizationActivityNet-1.3mAP IOU@0.958.4TriDet (TSP features)
Temporal Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)69.3TriDet (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.383.6TriDet (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.480.1TriDet (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.572.9TriDet (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.662.4TriDet (I3D features)
Temporal Action LocalizationTHUMOS’14mAP IOU@0.747.4TriDet (I3D features)
Temporal Action LocalizationEPIC-KITCHENS-100Avg mAP (0.1-0.5)25.4TriDet (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.128.6TriDet (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.227.4TriDet (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.326.1TriDet (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.424.2TriDet (verb)
Temporal Action LocalizationEPIC-KITCHENS-100mAP IOU@0.520.8TriDet (verb)
Zero-Shot LearningHACSAverage-mAP38.6TriDet (SlowFast)
Zero-Shot LearningHACSmAP@0.556.7TriDet (SlowFast)
Zero-Shot LearningHACSmAP@0.7539.3TriDet (SlowFast)
Zero-Shot LearningHACSmAP@0.9511.7TriDet (SlowFast)
Zero-Shot LearningHACSAverage-mAP36.8TriDet (I3D RGB)
Zero-Shot LearningHACSmAP@0.554.5TriDet (I3D RGB)
Zero-Shot LearningHACSmAP@0.7536.8TriDet (I3D RGB)
Zero-Shot LearningHACSmAP@0.9511.5TriDet (I3D RGB)
Zero-Shot LearningActivityNet-1.3mAP36.8TriDet (TSP features)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.554.7TriDet (TSP features)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.7538TriDet (TSP features)
Zero-Shot LearningActivityNet-1.3mAP IOU@0.958.4TriDet (TSP features)
Zero-Shot LearningTHUMOS’14Avg mAP (0.3:0.7)69.3TriDet (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.383.6TriDet (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.480.1TriDet (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.572.9TriDet (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.662.4TriDet (I3D features)
Zero-Shot LearningTHUMOS’14mAP IOU@0.747.4TriDet (I3D features)
Zero-Shot LearningEPIC-KITCHENS-100Avg mAP (0.1-0.5)25.4TriDet (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.128.6TriDet (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.227.4TriDet (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.326.1TriDet (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.424.2TriDet (verb)
Zero-Shot LearningEPIC-KITCHENS-100mAP IOU@0.520.8TriDet (verb)
Action LocalizationHACSAverage-mAP38.6TriDet (SlowFast)
Action LocalizationHACSmAP@0.556.7TriDet (SlowFast)
Action LocalizationHACSmAP@0.7539.3TriDet (SlowFast)
Action LocalizationHACSmAP@0.9511.7TriDet (SlowFast)
Action LocalizationHACSAverage-mAP36.8TriDet (I3D RGB)
Action LocalizationHACSmAP@0.554.5TriDet (I3D RGB)
Action LocalizationHACSmAP@0.7536.8TriDet (I3D RGB)
Action LocalizationHACSmAP@0.9511.5TriDet (I3D RGB)
Action LocalizationActivityNet-1.3mAP36.8TriDet (TSP features)
Action LocalizationActivityNet-1.3mAP IOU@0.554.7TriDet (TSP features)
Action LocalizationActivityNet-1.3mAP IOU@0.7538TriDet (TSP features)
Action LocalizationActivityNet-1.3mAP IOU@0.958.4TriDet (TSP features)
Action LocalizationTHUMOS’14Avg mAP (0.3:0.7)69.3TriDet (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.383.6TriDet (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.480.1TriDet (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.572.9TriDet (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.662.4TriDet (I3D features)
Action LocalizationTHUMOS’14mAP IOU@0.747.4TriDet (I3D features)
Action LocalizationEPIC-KITCHENS-100Avg mAP (0.1-0.5)25.4TriDet (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.128.6TriDet (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.227.4TriDet (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.326.1TriDet (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.424.2TriDet (verb)
Action LocalizationEPIC-KITCHENS-100mAP IOU@0.520.8TriDet (verb)

Related Papers

DVFL-Net: A Lightweight Distilled Video Focal Modulation Network for Spatio-Temporal Action Recognition2025-07-16CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Including Semantic Information via Word Embeddings for Skeleton-based Action Recognition2025-06-23Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Zero-Shot Temporal Interaction Localization for Egocentric Videos2025-06-04Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02