TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Integrating Boxes and Masks: A Multi-Object Framework for ...

Integrating Boxes and Masks: A Multi-Object Framework for Unified Visual Tracking and Segmentation

Yuanyou Xu, Zongxin Yang, Yi Yang

2023-08-25ICCV 2023 1Visual Object TrackingVisual TrackingRepresentation LearningSegmentationSemantic SegmentationVideo Object SegmentationObject TrackingVideo Semantic Segmentation
PaperPDFCode(official)

Abstract

Tracking any given object(s) spatially and temporally is a common purpose in Visual Object Tracking (VOT) and Video Object Segmentation (VOS). Joint tracking and segmentation have been attempted in some studies but they often lack full compatibility of both box and mask in initialization and prediction, and mainly focus on single-object scenarios. To address these limitations, this paper proposes a Multi-object Mask-box Integrated framework for unified Tracking and Segmentation, dubbed MITS. Firstly, the unified identification module is proposed to support both box and mask reference for initialization, where detailed object information is inferred from boxes or directly retained from masks. Additionally, a novel pinpoint box predictor is proposed for accurate multi-object box prediction, facilitating target-oriented representation learning. All target objects are processed simultaneously from encoding to propagation and decoding, as a unified pipeline for VOT and VOS. Experimental results show MITS achieves state-of-the-art performance on both VOT and VOS benchmarks. Notably, MITS surpasses the best prior VOT competitor by around 6% on the GOT-10k test set, and significantly improves the performance of box initialization on VOS benchmarks. The code is available at https://github.com/yoxu515/MITS.

Results

TaskDatasetMetricValueModel
Object TrackingLaSOTAUC72MITS
Object TrackingLaSOTNormalized Precision80.1MITS
Object TrackingLaSOTPrecision78.5MITS
Object TrackingGOT-10kAverage Overlap80.4MITS
Object TrackingGOT-10kSuccess Rate 0.589.8MITS
Object TrackingGOT-10kSuccess Rate 0.7575.8MITS
Object TrackingTrackingNetAccuracy83.4MITS
Object TrackingTrackingNetNormalized Precision88.9MITS
Object TrackingTrackingNetPrecision84.6MITS
Visual Object TrackingLaSOTAUC72MITS
Visual Object TrackingLaSOTNormalized Precision80.1MITS
Visual Object TrackingLaSOTPrecision78.5MITS
Visual Object TrackingGOT-10kAverage Overlap80.4MITS
Visual Object TrackingGOT-10kSuccess Rate 0.589.8MITS
Visual Object TrackingGOT-10kSuccess Rate 0.7575.8MITS
Visual Object TrackingTrackingNetAccuracy83.4MITS
Visual Object TrackingTrackingNetNormalized Precision88.9MITS
Visual Object TrackingTrackingNetPrecision84.6MITS

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17