TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Integrally Migrating Pre-trained Transformer Encoder-decod...

Integrally Migrating Pre-trained Transformer Encoder-decoders for Visual Object Detection

Feng Liu, Xiaosong Zhang, Zhiliang Peng, Zonghao Guo, Fang Wan, Xiangyang Ji, Qixiang Ye

2022-05-19ICCV 2023 1Few-Shot Object DetectionRepresentation Learningobject-detectionObject Detection
PaperPDFCodeCode(official)Code

Abstract

Modern object detectors have taken the advantages of backbone networks pre-trained on large scale datasets. Except for the backbone networks, however, other components such as the detector head and the feature pyramid network (FPN) remain trained from scratch, which hinders fully tapping the potential of representation models. In this study, we propose to integrally migrate pre-trained transformer encoder-decoders (imTED) to a detector, constructing a feature extraction path which is ``fully pre-trained" so that detectors' generalization capacity is maximized. The essential differences between imTED with the baseline detector are twofold: (1) migrating the pre-trained transformer decoder to the detector head while removing the randomly initialized FPN from the feature extraction path; and (2) defining a multi-scale feature modulator (MFM) to enhance scale adaptability. Such designs not only reduce randomly initialized parameters significantly but also unify detector training with representation learning intendedly. Experiments on the MS COCO object detection dataset show that imTED consistently outperforms its counterparts by $\sim$2.4 AP. Without bells and whistles, imTED improves the state-of-the-art of few-shot object detection by up to 7.6 AP. Code is available at https://github.com/LiewFeng/imTED.

Results

TaskDatasetMetricValueModel
Object DetectionMS-COCO (30-shot)AP30.2imTED+ViT-B
Object DetectionMS-COCO (30-shot)AP21imTED+ViT-S
Object DetectionMS-COCO (10-shot)AP22.5imTED+ViT-B
Object DetectionMS-COCO (10-shot)AP15imTED+ViT-S
3DMS-COCO (30-shot)AP30.2imTED+ViT-B
3DMS-COCO (30-shot)AP21imTED+ViT-S
3DMS-COCO (10-shot)AP22.5imTED+ViT-B
3DMS-COCO (10-shot)AP15imTED+ViT-S
Few-Shot Object DetectionMS-COCO (30-shot)AP30.2imTED+ViT-B
Few-Shot Object DetectionMS-COCO (30-shot)AP21imTED+ViT-S
Few-Shot Object DetectionMS-COCO (10-shot)AP22.5imTED+ViT-B
Few-Shot Object DetectionMS-COCO (10-shot)AP15imTED+ViT-S
2D ClassificationMS-COCO (30-shot)AP30.2imTED+ViT-B
2D ClassificationMS-COCO (30-shot)AP21imTED+ViT-S
2D ClassificationMS-COCO (10-shot)AP22.5imTED+ViT-B
2D ClassificationMS-COCO (10-shot)AP15imTED+ViT-S
2D Object DetectionMS-COCO (30-shot)AP30.2imTED+ViT-B
2D Object DetectionMS-COCO (30-shot)AP21imTED+ViT-S
2D Object DetectionMS-COCO (10-shot)AP22.5imTED+ViT-B
2D Object DetectionMS-COCO (10-shot)AP15imTED+ViT-S
16kMS-COCO (30-shot)AP30.2imTED+ViT-B
16kMS-COCO (30-shot)AP21imTED+ViT-S
16kMS-COCO (10-shot)AP22.5imTED+ViT-B
16kMS-COCO (10-shot)AP15imTED+ViT-S

Related Papers

Touch in the Wild: Learning Fine-Grained Manipulation with a Portable Visuo-Tactile Gripper2025-07-20Spectral Bellman Method: Unifying Representation and Exploration in RL2025-07-17Boosting Team Modeling through Tempo-Relational Representation Learning2025-07-17A Real-Time System for Egocentric Hand-Object Interaction Detection in Industrial Domains2025-07-17RS-TinyNet: Stage-wise Feature Fusion Network for Detecting Tiny Objects in Remote Sensing Images2025-07-17Decoupled PROB: Decoupled Query Initialization Tasks and Objectness-Class Learning for Open World Object Detection2025-07-17Dual LiDAR-Based Traffic Movement Count Estimation at a Signalized Intersection: Deployment, Data Collection, and Preliminary Analysis2025-07-17Similarity-Guided Diffusion for Contrastive Sequential Recommendation2025-07-16