TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Encoder-Decoder Based Convolutional Neural Networks with M...

Encoder-Decoder Based Convolutional Neural Networks with Multi-Scale-Aware Modules for Crowd Counting

Pongpisit Thanasutives, Ken-ichi Fukui, Masayuki Numao, Boonserm Kijsirikul

2020-03-12Crowd CountingObject Counting
PaperPDFCodeCode(official)

Abstract

In this paper, we propose two modified neural networks based on dual path multi-scale fusion networks (SFANet) and SegNet for accurate and efficient crowd counting. Inspired by SFANet, the first model, which is named M-SFANet, is attached with atrous spatial pyramid pooling (ASPP) and context-aware module (CAN). The encoder of M-SFANet is enhanced with ASPP containing parallel atrous convolutional layers with different sampling rates and hence able to extract multi-scale features of the target object and incorporate larger context. To further deal with scale variation throughout an input image, we leverage the CAN module which adaptively encodes the scales of the contextual information. The combination yields an effective model for counting in both dense and sparse crowd scenes. Based on the SFANet decoder structure, M-SFANet's decoder has dual paths, for density map and attention map generation. The second model is called M-SegNet, which is produced by replacing the bilinear upsampling in SFANet with max unpooling that is used in SegNet. This change provides a faster model while providing competitive counting performance. Designed for high-speed surveillance applications, M-SegNet has no additional multi-scale-aware module in order to not increase the complexity. Both models are encoder-decoder based architectures and are end-to-end trainable. We conduct extensive experiments on five crowd counting datasets and one vehicle counting dataset to show that these modifications yield algorithms that could improve state-of-the-art crowd counting methods. Codes are available at https://github.com/Pongpisit-Thanasutives/Variations-of-SFANet-for-Crowd-Counting.

Results

TaskDatasetMetricValueModel
CrowdsShanghaiTech BMAE6.32M-SFANet+M-SegNet
CrowdsShanghaiTech BMSE10.06M-SFANet+M-SegNet
CrowdsUCF-QNRFMAE85.6M-SFANet
CrowdsTRANCOSMAE2.22M-SFANet+M-SegNet
CrowdsShanghaiTech AMAE57.55M-SFANet+M-SegNet
CrowdsShanghaiTech AMSE94.48M-SFANet+M-SegNet
CrowdsUCF CC 50MAE162.33M-SFANet
CrowdsWorldExpo’10Average MAE7.32M-SFANet

Related Papers

Car Object Counting and Position Estimation via Extension of the CLIP-EBC Framework2025-07-11EBC-ZIP: Improving Blockwise Crowd Counting with Zero-Inflated Poisson Regression2025-06-24OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models2025-06-03Point-to-Region Loss for Semi-Supervised Point-Based Crowd Counting2025-05-28Improving Contrastive Learning for Referring Expression Counting2025-05-28InstructSAM: A Training-Free Framework for Instruction-Oriented Remote Sensing Object Recognition2025-05-21Expanding Zero-Shot Object Counting with Rich Prompts2025-05-21VisionReasoner: Unified Visual Perception and Reasoning via Reinforcement Learning2025-05-17