TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Temporal RoI Align for Video Object Recognition

Temporal RoI Align for Video Object Recognition

Tao Gong, Kai Chen, Xinjiang Wang, Qi Chu, Feng Zhu, Dahua Lin, Nenghai Yu, Huamin Feng

2021-09-08Video Object DetectionObject RecognitionSemantic SegmentationInstance SegmentationVideo Instance Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

Video object detection is challenging in the presence of appearance deterioration in certain video frames. Therefore, it is a natural choice to aggregate temporal information from other frames of the same video into the current frame. However, RoI Align, as one of the most core procedures of video detectors, still remains extracting features from a single-frame feature map for proposals, making the extracted RoI features lack temporal information from videos. In this work, considering the features of the same object instance are highly similar among frames in a video, a novel Temporal RoI Align operator is proposed to extract features from other frames feature maps for current frame proposals by utilizing feature similarity. The proposed Temporal RoI Align operator can extract temporal information from the entire video for proposals. We integrate it into single-frame video detectors and other state-of-the-art video detectors, and conduct quantitative experiments to demonstrate that the proposed Temporal RoI Align operator can consistently and significantly boost the performance. Besides, the proposed Temporal RoI Align can also be applied into video instance segmentation. Codes are available at https://github.com/open-mmlab/mmtracking

Results

TaskDatasetMetricValueModel
Object DetectionEPIC KITCHENS-seen splitsmAP42.2Temporal ROI Align
Object DetectionImageNet VIDMAP 84.3Temporal ROI Align (ResNeXt101)
Object DetectionEPIC KITCHENS-unseen splitsmAP39.6Temporal ROI Align
3DEPIC KITCHENS-seen splitsmAP42.2Temporal ROI Align
3DImageNet VIDMAP 84.3Temporal ROI Align (ResNeXt101)
3DEPIC KITCHENS-unseen splitsmAP39.6Temporal ROI Align
Video Instance SegmentationYouTube-VISmask AP38Temporal ROI Align
2D ClassificationEPIC KITCHENS-seen splitsmAP42.2Temporal ROI Align
2D ClassificationImageNet VIDMAP 84.3Temporal ROI Align (ResNeXt101)
2D ClassificationEPIC KITCHENS-unseen splitsmAP39.6Temporal ROI Align
2D Object DetectionEPIC KITCHENS-seen splitsmAP42.2Temporal ROI Align
2D Object DetectionImageNet VIDMAP 84.3Temporal ROI Align (ResNeXt101)
2D Object DetectionEPIC KITCHENS-unseen splitsmAP39.6Temporal ROI Align
16kEPIC KITCHENS-seen splitsmAP42.2Temporal ROI Align
16kImageNet VIDMAP 84.3Temporal ROI Align (ResNeXt101)
16kEPIC KITCHENS-unseen splitsmAP39.6Temporal ROI Align

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17