GreedyPrune: Retenting Critical Visual Token Set for Large Vision Language Models

Ruiguang Pei, Weiqing Sun, Zhihui Fu, Jun Wang

2025-06-16Combinatorial Optimization

Abstract

Although Large Vision Language Models (LVLMs) have demonstrated remarkable performance in image understanding tasks, their computational efficiency remains a significant challenge, particularly on resource-constrained devices due to the high cost of processing large numbers of visual tokens. Recently, training-free visual token pruning methods have gained popularity as a low-cost solution to this issue. However, existing approaches suffer from two key limitations: semantic saliency-based strategies primarily focus on high cross-attention visual tokens, often neglecting visual diversity, whereas visual diversity-based methods risk inadvertently discarding semantically important tokens, especially under high compression ratios. In this paper, we introduce GreedyPrune, a training-free plug-and-play visual token pruning algorithm designed to jointly optimize semantic saliency and visual diversity. We formalize the token pruning process as a combinatorial optimization problem and demonstrate that greedy algorithms effectively balance computational efficiency with model accuracy. Extensive experiments validate the effectiveness of our approach, showing that GreedyPrune achieves state-of-the-art accuracy across various multimodal tasks and models while significantly reducing end-to-end inference latency.

Related Papers

Large Language Models for Combinatorial Optimization: A Systematic Review2025-07-04 LRM-1B: Towards Large Routing Model2025-07-04 Higher-Order Neuromorphic Ising Machines -- Autoencoders and Fowler-Nordheim Annealers are all you need for Scalability2025-06-24 On Training-Test (Mis)alignment in Unsupervised Combinatorial Optimization: Observation, Empirical Exploration, and Analysis2025-06-20 HeurAgenix: Leveraging LLMs for Solving Complex Combinatorial Optimization Challenges2025-06-18 Synthesizing Min-Max Control Barrier Functions For Switched Affine Systems2025-06-12 Large Language Models for Design Structure Matrix Optimization2025-06-11 Synergizing Reinforcement Learning and Genetic Algorithms for Neural Combinatorial Optimization2025-06-11