TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/R-C3D: Region Convolutional 3D Network for Temporal Activi...

R-C3D: Region Convolutional 3D Network for Temporal Activity Detection

Huijuan Xu, Abir Das, Kate Saenko

2017-03-22ICCV 2017 10Action DetectionActivity DetectionGeneral ClassificationAction Recognition In Videos
PaperPDFCode(official)CodeCode

Abstract

We address the problem of activity detection in continuous, untrimmed video streams. This is a difficult task that requires extracting meaningful spatio-temporal features to capture activities, accurately localizing the start and end times of each activity. We introduce a new model, Region Convolutional 3D Network (R-C3D), which encodes the video streams using a three-dimensional fully convolutional network, then generates candidate temporal regions containing activities, and finally classifies selected regions into specific activities. Computation is saved due to the sharing of convolutional features between the proposal and the classification pipelines. The entire model is trained end-to-end with jointly optimized localization and classification losses. R-C3D is faster than existing methods (569 frames per second on a single Titan X Maxwell GPU) and achieves state-of-the-art results on THUMOS'14. We further demonstrate that our model is a general activity detection framework that does not rely on assumptions about particular dataset properties by evaluating our approach on ActivityNet and Charades. Our code is available at http://ai.bu.edu/r-c3d/.

Results

TaskDatasetMetricValueModel
VideoTHUMOS’14mAP IOU@0.154.5R-C3D
VideoTHUMOS’14mAP IOU@0.251.5R-C3D
VideoTHUMOS’14mAP IOU@0.344.8R-C3D
VideoTHUMOS’14mAP IOU@0.435.6R-C3D
VideoTHUMOS’14mAP IOU@0.528.9R-C3D
Temporal Action LocalizationTHUMOS’14mAP IOU@0.154.5R-C3D
Temporal Action LocalizationTHUMOS’14mAP IOU@0.251.5R-C3D
Temporal Action LocalizationTHUMOS’14mAP IOU@0.344.8R-C3D
Temporal Action LocalizationTHUMOS’14mAP IOU@0.435.6R-C3D
Temporal Action LocalizationTHUMOS’14mAP IOU@0.528.9R-C3D
Zero-Shot LearningTHUMOS’14mAP IOU@0.154.5R-C3D
Zero-Shot LearningTHUMOS’14mAP IOU@0.251.5R-C3D
Zero-Shot LearningTHUMOS’14mAP IOU@0.344.8R-C3D
Zero-Shot LearningTHUMOS’14mAP IOU@0.435.6R-C3D
Zero-Shot LearningTHUMOS’14mAP IOU@0.528.9R-C3D
Activity RecognitionTHUMOS’14mAP@0.154.5Single-stream R-C3D (two-way buffer)
Activity RecognitionTHUMOS’14mAP@0.251.5Single-stream R-C3D (two-way buffer)
Activity RecognitionTHUMOS’14mAP@0.344.8Single-stream R-C3D (two-way buffer)
Activity RecognitionTHUMOS’14mAP@0.435.6Single-stream R-C3D (two-way buffer)
Activity RecognitionTHUMOS’14mAP@0.528.9Single-stream R-C3D (two-way buffer)
Activity RecognitionTHUMOS’14mAP@0.151.6Single-stream R-C3D (one-way buffer)
Activity RecognitionTHUMOS’14mAP@0.249.2Single-stream R-C3D (one-way buffer)
Activity RecognitionTHUMOS’14mAP@0.342.8Single-stream R-C3D (one-way buffer)
Activity RecognitionTHUMOS’14mAP@0.433.4Single-stream R-C3D (one-way buffer)
Activity RecognitionTHUMOS’14mAP@0.527Single-stream R-C3D (one-way buffer)
Action LocalizationTHUMOS’14mAP IOU@0.154.5R-C3D
Action LocalizationTHUMOS’14mAP IOU@0.251.5R-C3D
Action LocalizationTHUMOS’14mAP IOU@0.344.8R-C3D
Action LocalizationTHUMOS’14mAP IOU@0.435.6R-C3D
Action LocalizationTHUMOS’14mAP IOU@0.528.9R-C3D
Action DetectionCharadesmAP12.4R-C3D
Action RecognitionTHUMOS’14mAP@0.154.5Single-stream R-C3D (two-way buffer)
Action RecognitionTHUMOS’14mAP@0.251.5Single-stream R-C3D (two-way buffer)
Action RecognitionTHUMOS’14mAP@0.344.8Single-stream R-C3D (two-way buffer)
Action RecognitionTHUMOS’14mAP@0.435.6Single-stream R-C3D (two-way buffer)
Action RecognitionTHUMOS’14mAP@0.528.9Single-stream R-C3D (two-way buffer)
Action RecognitionTHUMOS’14mAP@0.151.6Single-stream R-C3D (one-way buffer)
Action RecognitionTHUMOS’14mAP@0.249.2Single-stream R-C3D (one-way buffer)
Action RecognitionTHUMOS’14mAP@0.342.8Single-stream R-C3D (one-way buffer)
Action RecognitionTHUMOS’14mAP@0.433.4Single-stream R-C3D (one-way buffer)
Action RecognitionTHUMOS’14mAP@0.527Single-stream R-C3D (one-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.154.5Single-stream R-C3D (two-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.251.5Single-stream R-C3D (two-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.344.8Single-stream R-C3D (two-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.435.6Single-stream R-C3D (two-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.528.9Single-stream R-C3D (two-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.151.6Single-stream R-C3D (one-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.249.2Single-stream R-C3D (one-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.342.8Single-stream R-C3D (one-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.433.4Single-stream R-C3D (one-way buffer)
Action Recognition In VideosTHUMOS’14mAP@0.527Single-stream R-C3D (one-way buffer)

Related Papers

CBF-AFA: Chunk-Based Multi-SSL Fusion for Automatic Fluency Assessment2025-06-25MultiHuman-Testbench: Benchmarking Image Generation for Multiple Humans2025-06-25Distributed Activity Detection for Cell-Free Hybrid Near-Far Field Communications2025-06-17Speaker Diarization with Overlapping Community Detection Using Graph Attention Networks and Label Propagation Algorithm2025-06-03Attention Is Not Always the Answer: Optimizing Voice Activity Detection with Simple Feature Fusion2025-06-02Joint Activity Detection and Channel Estimation for Massive Connectivity: Where Message Passing Meets Score-Based Generative Priors2025-05-31Towards Robust Overlapping Speech Detection: A Speaker-Aware Progressive Approach Using WavLM2025-05-29Robust Activity Detection for Massive Random Access2025-05-21