TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/LiFT: A Surprisingly Simple Lightweight Feature Transform ...

LiFT: A Surprisingly Simple Lightweight Feature Transform for Dense ViT Descriptors

Saksham Suri, Matthew Walmer, Kamal Gupta, Abhinav Shrivastava

2024-03-21Feature UpsamplingObject Discovery
PaperPDFCode

Abstract

We present a simple self-supervised method to enhance the performance of ViT features for dense downstream tasks. Our Lightweight Feature Transform (LiFT) is a straightforward and compact postprocessing network that can be applied to enhance the features of any pre-trained ViT backbone. LiFT is fast and easy to train with a self-supervised objective, and it boosts the density of ViT features for minimal extra inference cost. Furthermore, we demonstrate that LiFT can be applied with approaches that use additional task-specific downstream modules, as we integrate LiFT with ViTDet for COCO detection and segmentation. Despite the simplicity of LiFT, we find that it is not simply learning a more complex version of bilinear interpolation. Instead, our LiFT training protocol leads to several desirable emergent properties that benefit ViT features in dense downstream tasks. This includes greater scale invariance for features, and better object boundary maps. By simply training LiFT for a few epochs, we show improved performance on keypoint correspondence, detection, segmentation, and object discovery tasks. Overall, LiFT provides an easy way to unlock the benefits of denser feature arrays for a fraction of the computational cost. For more details, refer to our project page at https://www.cs.umd.edu/~sakshams/LiFT/.

Results

TaskDatasetMetricValueModel
Representation LearningImageNetADCC53LiFT
Representation LearningImageNetAverage Drop66.9LiFT
Representation LearningImageNetAverage Increase8.7LiFT

Related Papers

When Does Pruning Benefit Vision Representations?2025-07-02JAFAR: Jack up Any Feature at Any Resolution2025-06-10FORLA:Federated Object-centric Representation Learning with Slot Attention2025-06-03Binding threshold units with artificial oscillatory neurons2025-05-06Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation2025-05-04Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning2025-05-04LoftUp: Learning a Coordinate-Based Feature Upsampler for Vision Foundation Models2025-04-18Are We Done with Object-Centric Learning?2025-04-09