TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Universal Instance Perception as Object Discovery and Retr...

Universal Instance Perception as Object Discovery and Retrieval

Bin Yan, Yi Jiang, Jiannan Wu, Dong Wang, Ping Luo, Zehuan Yuan, Huchuan Lu

2023-03-12CVPR 2023 1Visual Object TrackingDescribed Object DetectionReferring ExpressionVisual TrackingGeneralized Referring Expression ComprehensionMulti-Object Tracking and SegmentationReferring Expression ComprehensionZero Shot SegmentationReferring Video Object SegmentationReferring Expression SegmentationSemantic SegmentationObject DiscoveryObject TrackingInstance SegmentationMultiple Object TrackingRetrievalVideo Instance Segmentationobject-detectionObject Detection
PaperPDFCode(official)

Abstract

All instance perception tasks aim at finding certain objects specified by some queries such as category names, language expressions, and target annotations, but this complete field has been split into multiple independent subtasks. In this work, we present a universal instance perception model of the next generation, termed UNINEXT. UNINEXT reformulates diverse instance perception tasks into a unified object discovery and retrieval paradigm and can flexibly perceive different types of objects by simply changing the input prompts. This unified formulation brings the following benefits: (1) enormous data from different tasks and label vocabularies can be exploited for jointly training general instance-level representations, which is especially beneficial for tasks lacking in training data. (2) the unified model is parameter-efficient and can save redundant computation when handling multiple tasks simultaneously. UNINEXT shows superior performance on 20 challenging benchmarks from 10 instance-level tasks including classical image-level tasks (object detection and instance segmentation), vision-and-language tasks (referring expression comprehension and segmentation), and six video-level object tracking tasks. Code is available at https://github.com/MasterBin-IIAU/UNINEXT.

Results

TaskDatasetMetricValueModel
VideoBDD100K valmIDF156.7UNINEXT-H
VideoBDD100K valmMOTA44.2UNINEXT-H
Visual TrackingTNL2KAUC59.3UNINEXT-H
Visual TrackingTNL2Kprecision62.8UNINEXT-H
Object TrackingLaSOTAUC72.4UNINEXT-L
Object TrackingLaSOTNormalized Precision80.7UNINEXT-L
Object TrackingLaSOTPrecision78.9UNINEXT-L
Object TrackingLaSOTAUC72.2UNINEXT-H
Object TrackingLaSOTNormalized Precision80.8UNINEXT-H
Object TrackingLaSOTPrecision79.4UNINEXT-H
Object TrackingLaSOT-extAUC56.2UNINEXT-H
Object TrackingLaSOT-extNormalized Precision63.8UNINEXT-H
Object TrackingLaSOT-extPrecision63.8UNINEXT-H
Object TrackingTrackingNetAccuracy85.4UNINEXT-H
Object TrackingTrackingNetNormalized Precision89UNINEXT-H
Object TrackingTrackingNetPrecision86.4UNINEXT-H
Object TrackingBDD100K valmIDF156.7UNINEXT-H
Object TrackingBDD100K valmMOTA44.2UNINEXT-H
Object DetectionCOCO minivalAP5077.5UNINEXT-H
Object DetectionCOCO minivalAP7566.7UNINEXT-H
Object DetectionCOCO minivalAPL75.3UNINEXT-H
Object DetectionCOCO minivalAPM64.8UNINEXT-H
Object DetectionCOCO minivalAPS45.1UNINEXT-H
Object DetectionCOCO minivalbox AP60.6UNINEXT-H
Object DetectionDescription Detection DatasetIntra-scenario ABS mAP15.9UNINEXT-large
Object DetectionDescription Detection DatasetIntra-scenario FULL mAP17.9UNINEXT-large
Object DetectionDescription Detection DatasetIntra-scenario PRES mAP18.6UNINEXT-large
3DCOCO minivalAP5077.5UNINEXT-H
3DCOCO minivalAP7566.7UNINEXT-H
3DCOCO minivalAPL75.3UNINEXT-H
3DCOCO minivalAPM64.8UNINEXT-H
3DCOCO minivalAPS45.1UNINEXT-H
3DCOCO minivalbox AP60.6UNINEXT-H
3DDescription Detection DatasetIntra-scenario ABS mAP15.9UNINEXT-large
3DDescription Detection DatasetIntra-scenario FULL mAP17.9UNINEXT-large
3DDescription Detection DatasetIntra-scenario PRES mAP18.6UNINEXT-large
Instance SegmentationCOCO test-devAP5076.2UNINEXT-H
Instance SegmentationCOCO test-devAP7556.7UNINEXT-H
Instance SegmentationCOCO test-devAPL67.5UNINEXT-H
Instance SegmentationCOCO test-devAPM55.9UNINEXT-H
Instance SegmentationCOCO test-devAPS33.3UNINEXT-H
Instance SegmentationCOCO test-devmask AP51.8UNINEXT-H
Instance SegmentationRefCoCo valOverall IoU82.19UNINEXT-H
Instance SegmentationRefer-YouTube-VOS (2021 public validation)F72.7UNINEXT-H
Instance SegmentationRefer-YouTube-VOS (2021 public validation)J67.6UNINEXT-H
Instance SegmentationRefer-YouTube-VOS (2021 public validation)J&F70.1UNINEXT-H
Instance SegmentationRefCOCO+ valOverall IoU72.47UNINEXT-H
Instance SegmentationRefCOCO+ test BOverall IoU66.22UNINEXT-H
Instance SegmentationDAVIS 2017 (val)J&F 1st frame72.5UNINEXT-H
Instance SegmentationRefCOCO+ testAOverall IoU76.42UNINEXT-H
Zero Shot SegmentationSegmentation in the WildMean AP42.1UNINEXT
Referring Expression SegmentationRefCoCo valOverall IoU82.19UNINEXT-H
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)F72.7UNINEXT-H
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)J67.6UNINEXT-H
Referring Expression SegmentationRefer-YouTube-VOS (2021 public validation)J&F70.1UNINEXT-H
Referring Expression SegmentationRefCOCO+ valOverall IoU72.47UNINEXT-H
Referring Expression SegmentationRefCOCO+ test BOverall IoU66.22UNINEXT-H
Referring Expression SegmentationDAVIS 2017 (val)J&F 1st frame72.5UNINEXT-H
Referring Expression SegmentationRefCOCO+ testAOverall IoU76.42UNINEXT-H
Video Instance SegmentationOVIS validationAP5072.5UNINEXT (ViT-H, Online)
Video Instance SegmentationOVIS validationAP7552.2UNINEXT (ViT-H, Online)
Video Instance SegmentationOVIS validationmask AP49UNINEXT (ViT-H, Online)
Video Instance SegmentationOVIS validationAP5055.5UNINEXT (ResNet-50, Online)
Video Instance SegmentationOVIS validationAP7535.6UNINEXT (ResNet-50, Online)
Video Instance SegmentationOVIS validationmask AP34UNINEXT (ResNet-50, Online)
Multi-Object Tracking and SegmentationBDD100K valmMOTSA35.7UNINEXT-H
Multiple Object TrackingBDD100K valmIDF156.7UNINEXT-H
Multiple Object TrackingBDD100K valmMOTA44.2UNINEXT-H
2D ClassificationCOCO minivalAP5077.5UNINEXT-H
2D ClassificationCOCO minivalAP7566.7UNINEXT-H
2D ClassificationCOCO minivalAPL75.3UNINEXT-H
2D ClassificationCOCO minivalAPM64.8UNINEXT-H
2D ClassificationCOCO minivalAPS45.1UNINEXT-H
2D ClassificationCOCO minivalbox AP60.6UNINEXT-H
2D ClassificationDescription Detection DatasetIntra-scenario ABS mAP15.9UNINEXT-large
2D ClassificationDescription Detection DatasetIntra-scenario FULL mAP17.9UNINEXT-large
2D ClassificationDescription Detection DatasetIntra-scenario PRES mAP18.6UNINEXT-large
2D Object DetectionCOCO minivalAP5077.5UNINEXT-H
2D Object DetectionCOCO minivalAP7566.7UNINEXT-H
2D Object DetectionCOCO minivalAPL75.3UNINEXT-H
2D Object DetectionCOCO minivalAPM64.8UNINEXT-H
2D Object DetectionCOCO minivalAPS45.1UNINEXT-H
2D Object DetectionCOCO minivalbox AP60.6UNINEXT-H
2D Object DetectionDescription Detection DatasetIntra-scenario ABS mAP15.9UNINEXT-large
2D Object DetectionDescription Detection DatasetIntra-scenario FULL mAP17.9UNINEXT-large
2D Object DetectionDescription Detection DatasetIntra-scenario PRES mAP18.6UNINEXT-large
Generalized Referring Expression ComprehensiongRefCOCON-acc.50.6UNINEXT
Generalized Referring Expression ComprehensiongRefCOCOPrecision@(F1=1, IoU≥0.5)58.2UNINEXT
Visual Object TrackingLaSOTAUC72.4UNINEXT-L
Visual Object TrackingLaSOTNormalized Precision80.7UNINEXT-L
Visual Object TrackingLaSOTPrecision78.9UNINEXT-L
Visual Object TrackingLaSOTAUC72.2UNINEXT-H
Visual Object TrackingLaSOTNormalized Precision80.8UNINEXT-H
Visual Object TrackingLaSOTPrecision79.4UNINEXT-H
Visual Object TrackingLaSOT-extAUC56.2UNINEXT-H
Visual Object TrackingLaSOT-extNormalized Precision63.8UNINEXT-H
Visual Object TrackingLaSOT-extPrecision63.8UNINEXT-H
Visual Object TrackingTrackingNetAccuracy85.4UNINEXT-H
Visual Object TrackingTrackingNetNormalized Precision89UNINEXT-H
Visual Object TrackingTrackingNetPrecision86.4UNINEXT-H
16kCOCO minivalAP5077.5UNINEXT-H
16kCOCO minivalAP7566.7UNINEXT-H
16kCOCO minivalAPL75.3UNINEXT-H
16kCOCO minivalAPM64.8UNINEXT-H
16kCOCO minivalAPS45.1UNINEXT-H
16kCOCO minivalbox AP60.6UNINEXT-H
16kDescription Detection DatasetIntra-scenario ABS mAP15.9UNINEXT-large
16kDescription Detection DatasetIntra-scenario FULL mAP17.9UNINEXT-large
16kDescription Detection DatasetIntra-scenario PRES mAP18.6UNINEXT-large

Related Papers

SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17MVA 2025 Small Multi-Object Tracking for Spotting Birds Challenge: Dataset, Methods, and Results2025-07-17From Roots to Rewards: Dynamic Tree Reasoning with RL2025-07-17HapticCap: A Multimodal Dataset and Task for Understanding User Experience of Vibration Haptic Signals2025-07-17