JAFAR: Jack up Any Feature at Any Resolution

Paul Couairon, Loick Chambon, Louis Serrano, Jean-Emmanuel Haugeard, Matthieu Cord, Nicolas Thome

2025-06-10Feature Upsampling

Abstract

Foundation Vision Encoders have become essential for a wide range of dense vision tasks. However, their low-resolution spatial feature outputs necessitate feature upsampling to produce the high-resolution modalities required for downstream tasks. In this work, we introduce JAFAR, a lightweight and flexible feature upsampler that enhances the spatial resolution of visual features from any Foundation Vision Encoder to an arbitrary target resolution. JAFAR employs an attention-based module designed to promote semantic alignment between high-resolution queries, derived from low-level image features, and semantically enriched low-resolution keys, using Spatial Feature Transform (SFT) modulation. Notably, despite the absence of high-resolution supervision, we demonstrate that learning at low upsampling ratios and resolutions generalizes remarkably well to significantly higher output scales. Extensive experiments show that JAFAR effectively recovers fine-grained spatial details and consistently outperforms existing feature upsampling methods across a diverse set of downstream tasks. Project page at https://jafar-upsampler.github.io

Results

Task	Dataset	Metric	Value	Model
Representation Learning	ImageNet	ADCC	73.3	JAFAR
Representation Learning	ImageNet	Average Drop	17.4	JAFAR
Representation Learning	ImageNet	Average Increase	30.9	JAFAR

Related Papers

Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation2025-05-04 LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models2025-04-18 LDA-AQU: Adaptive Query-guided Upsampling via Local Deformable Attention2024-11-29 Lighten CARAFE: Dynamic Lightweight Upsampling with Guided Reassemble Kernels2024-10-29 EfficientCD: A New Strategy For Change Detection Based With Bi-temporal Layers Exchanged2024-07-22 FADE: A Task-Agnostic Upsampling Operator for Encoder-Decoder Architectures2024-07-18 A Refreshed Similarity-based Upsampler for Direct High-Ratio Feature Upsampling2024-07-02 LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors2024-03-21