TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Open-world Text-specified Object Counting

Open-world Text-specified Object Counting

Niki Amini-Naieni, Kiana Amini-Naieni, Tengda Han, Andrew Zisserman

2023-06-02Zero-Shot CountingObject Counting
PaperPDFCode(official)

Abstract

Our objective is open-world object counting in images, where the target object class is specified by a text description. To this end, we propose CounTX, a class-agnostic, single-stage model using a transformer decoder counting head on top of pre-trained joint text-image representations. CounTX is able to count the number of instances of any class given only an image and a text description of the target object class, and can be trained end-to-end. In addition to this model, we make the following contributions: (i) we compare the performance of CounTX to prior work on open-world object counting, and show that our approach exceeds the state of the art on all measures on the FSC-147 benchmark for methods that use text to specify the task; (ii) we present and release FSC-147-D, an enhanced version of FSC-147 with text descriptions, so that object classes can be described with more detailed language than their simple class names. FSC-147-D and the code are available at https://www.robots.ox.ac.uk/~vgg/research/countx.

Results

TaskDatasetMetricValueModel
Object CountingFSC147MAE(test)15.88CounTX (uses text descriptions instead of visual exemplars)
Object CountingFSC147MAE(val)17.1CounTX (uses text descriptions instead of visual exemplars)
Object CountingFSC147RMSE(test)106.29CounTX (uses text descriptions instead of visual exemplars)
Object CountingFSC147RMSE(val)65.61CounTX (uses text descriptions instead of visual exemplars)
Object CountingCARPKMAE8.13CounTX (uses arbitrary text input to specify object to count, used "the cars" for CARPK)
Object CountingCARPKRMSE10.87CounTX (uses arbitrary text input to specify object to count, used "the cars" for CARPK)

Related Papers

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models2025-06-03Improving Contrastive Learning for Referring Expression Counting2025-05-28Expanding Zero-Shot Object Counting with Rich Prompts2025-05-21InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition2025-05-21VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning2025-05-17Are Multimodal Large Language Models Ready for Omnidirectional Spatial Reasoning?2025-05-17Learning What NOT to Count2025-04-16