TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/CLIP-EBC: CLIP Can Count Accurately through Enhanced Block...

CLIP-EBC: CLIP Can Count Accurately through Enhanced Blockwise Classification

Yiming Ma, Victor Sanchez, Tanaya Guha

2024-03-14Crowd CountingQuantizationDensity EstimationClassification
PaperPDFCode(official)

Abstract

We propose CLIP-EBC, the first fully CLIP-based model for accurate crowd density estimation. While the CLIP model has demonstrated remarkable success in addressing recognition tasks such as zero-shot image classification, its potential for counting has been largely unexplored due to the inherent challenges in transforming a regression problem, such as counting, into a recognition task. In this work, we investigate and enhance CLIP's ability to count, focusing specifically on the task of estimating crowd sizes from images. Existing classification-based crowd-counting frameworks have significant limitations, including the quantization of count values into bordering real-valued bins and the sole focus on classification errors. These practices result in label ambiguity near the shared borders and inaccurate prediction of count values. Hence, directly applying CLIP within these frameworks may yield suboptimal performance. To address these challenges, we first propose the Enhanced Blockwise Classification (EBC) framework. Unlike previous methods, EBC utilizes integer-valued bins, effectively reducing ambiguity near bin boundaries. Additionally, it incorporates a regression loss based on density maps to improve the prediction of count values. Within our backbone-agnostic EBC framework, we then introduce CLIP-EBC to fully leverage CLIP's recognition capabilities for this task. Extensive experiments demonstrate the effectiveness of EBC and the competitive performance of CLIP-EBC. Specifically, our EBC framework can improve existing classification-based methods by up to 44.5% on the UCF-QNRF dataset, and CLIP-EBC achieves state-of-the-art performance on the NWPU-Crowd test set, with an MAE of 58.2 and an RMSE of 268.5, representing improvements of 8.6% and 13.3% over the previous best method, STEERER. The code and weights are available at https://github.com/Yiming-M/CLIP-EBC.

Results

TaskDatasetMetricValueModel
CrowdsShanghaiTech BMAE5.9CLIP-EBC (ViT-L/14)
CrowdsShanghaiTech BRMSE9.2CLIP-EBC (ViT-L/14)
CrowdsShanghaiTech BMAE6CLIP-EBC (ResNet50)
CrowdsShanghaiTech BRMSE10.1CLIP-EBC (ResNet50)
CrowdsShanghaiTech BMAE6.6CLIP-EBC (ViT-B/16)
CrowdsShanghaiTech BRMSE10.5CLIP-EBC (ViT-B/16)
CrowdsShanghaiTech BMAE6.9CSRNet-EBC
CrowdsShanghaiTech BRMSE11.3CSRNet-EBC
CrowdsShanghaiTech BMAE7DMCount-EBC
CrowdsShanghaiTech BRMSE10.9DMCount-EBC
CrowdsNWPU-Crowd (Val)MAE32.3CLIP-EBC (ViT-L/14)
CrowdsNWPU-Crowd (Val)RMSE79.7CLIP-EBC (ViT-L/14)
CrowdsNWPU-Crowd (Val)MAE36.6CLIP-EBC (ViT-B/16)
CrowdsNWPU-Crowd (Val)RMSE81.7CLIP-EBC (ViT-B/16)
CrowdsNWPU-Crowd (Val)MAE38.6CLIP-EBC (ResNet50)
CrowdsNWPU-Crowd (Val)RMSE90.3CLIP-EBC (ResNet50)
CrowdsNWPU-Crowd (Val)MAE39.6DMCount-EBC
CrowdsNWPU-Crowd (Val)RMSE95.8DMCount-EBC
CrowdsNWPU-Crowd (Val)MAE42.9CSRNet-EBC
CrowdsNWPU-Crowd (Val)RMSE100.1CSRNet-EBC
CrowdsUCF-QNRFMAE75.9DMCount-EBC (16, dynamic)
CrowdsUCF-QNRFRMSE130.48DMCount-EBC (16, dynamic)
CrowdsUCF-QNRFMAE76.06DMCount-EBC (32, dynamic)
CrowdsUCF-QNRFRMSE127.72DMCount-EBC (32, dynamic)
CrowdsUCF-QNRFMAE77.2DMCount-EBC
CrowdsUCF-QNRFRMSE130.4DMCount-EBC
CrowdsUCF-QNRFMAE79.3CSRNet-EBC
CrowdsUCF-QNRFRMSE135.8CSRNet-EBC
CrowdsUCF-QNRFMAE80.3CLIP-EBC (ViT-B/16)
CrowdsUCF-QNRFRMSE139.3CLIP-EBC (ViT-B/16)
CrowdsUCF-QNRFMAE80.5CLIP-EBC (ResNet50)
CrowdsUCF-QNRFRMSE136.6CLIP-EBC (ResNet50)
CrowdsShanghaiTech AMAE52.5CLIP-EBC (ViT-B/16)
CrowdsShanghaiTech ARMSE85.9CLIP-EBC (ViT-B/16)
CrowdsShanghaiTech AMAE54CLIP-EBC (ResNet50)
CrowdsShanghaiTech ARMSE83.2CLIP-EBC (ResNet50)
CrowdsShanghaiTech AMAE62.3DMCount-EBC
CrowdsShanghaiTech ARMSE98.9DMCount-EBC
CrowdsShanghaiTech AMAE66.3CSRNet-EBC
CrowdsShanghaiTech ARMSE105CSRNet-EBC

Related Papers

Efficient Deployment of Spiking Neural Networks on SpiNNaker2 for DVS Gesture Recognition Using Neuromorphic Intermediate Representation2025-09-04Missing value imputation with adversarial random forests -- MissARF2025-07-21An End-to-End DNN Inference Framework for the SpiNNaker2 Neuromorphic MPSoC2025-07-18Task-Specific Audio Coding for Machines: Machine-Learned Latent Features Are Codes for That Machine2025-07-17Angle Estimation of a Single Source with Massive Uniform Circular Arrays2025-07-17Adversarial attacks to image classification systems using evolutionary algorithms2025-07-17Efficient Calisthenics Skills Classification through Foreground Instance Selection and Depth Estimation2025-07-16Safeguarding Federated Learning-based Road Condition Classification2025-07-16