TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/VLPD: Context-Aware Pedestrian Detection via Vision-Langua...

VLPD: Context-Aware Pedestrian Detection via Vision-Language Semantic Self-Supervision

Mengyin Liu, Jie Jiang, Chao Zhu, Xu-Cheng Yin

2023-04-06CVPR 2023 1Autonomous DrivingPedestrian Detection
PaperPDFCode(official)

Abstract

Detecting pedestrians accurately in urban scenes is significant for realistic applications like autonomous driving or video surveillance. However, confusing human-like objects often lead to wrong detections, and small scale or heavily occluded pedestrians are easily missed due to their unusual appearances. To address these challenges, only object regions are inadequate, thus how to fully utilize more explicit and semantic contexts becomes a key problem. Meanwhile, previous context-aware pedestrian detectors either only learn latent contexts with visual clues, or need laborious annotations to obtain explicit and semantic contexts. Therefore, we propose in this paper a novel approach via Vision-Language semantic self-supervision for context-aware Pedestrian Detection (VLPD) to model explicitly semantic contexts without any extra annotations. Firstly, we propose a self-supervised Vision-Language Semantic (VLS) segmentation method, which learns both fully-supervised pedestrian detection and contextual segmentation via self-generated explicit labels of semantic classes by vision-language models. Furthermore, a self-supervised Prototypical Semantic Contrastive (PSC) learning method is proposed to better discriminate pedestrians and other classes, based on more explicit and semantic contexts obtained from VLS. Extensive experiments on popular benchmarks show that our proposed VLPD achieves superior performances over the previous state-of-the-arts, particularly under challenging circumstances like small scale and heavy occlusion. Code is available at https://github.com/lmy98129/VLPD.

Results

TaskDatasetMetricValueModel
Autonomous VehiclesCaltechHeavy MR^-237.7VLPD
Autonomous VehiclesCaltechReasonable Miss Rate2.3VLPD
Autonomous VehiclesCityPersonsBare MR^-26.1VLPD
Autonomous VehiclesCityPersonsHeavy MR^-243.1VLPD
Autonomous VehiclesCityPersonsPartial MR^-28.8VLPD
Autonomous VehiclesCityPersonsReasonable MR^-29.4VLPD
Autonomous VehiclesCityPersonsSmall MR^-210.9VLPD
Pedestrian DetectionCaltechHeavy MR^-237.7VLPD
Pedestrian DetectionCaltechReasonable Miss Rate2.3VLPD
Pedestrian DetectionCityPersonsBare MR^-26.1VLPD
Pedestrian DetectionCityPersonsHeavy MR^-243.1VLPD
Pedestrian DetectionCityPersonsPartial MR^-28.8VLPD
Pedestrian DetectionCityPersonsReasonable MR^-29.4VLPD
Pedestrian DetectionCityPersonsSmall MR^-210.9VLPD

Related Papers

GEMINUS: Dual-aware Global and Scene-Adaptive Mixture-of-Experts for End-to-End Autonomous Driving2025-07-19AGENTS-LLM: Augmentative GENeration of Challenging Traffic Scenarios with an Agentic LLM Framework2025-07-18World Model-Based End-to-End Scene Generation for Accident Anticipation in Autonomous Driving2025-07-17Orbis: Overcoming Challenges of Long-Horizon Prediction in Driving World Models2025-07-17Channel-wise Motion Features for Efficient Motion Segmentation2025-07-17LaViPlan : Language-Guided Visual Path Planning with RLVR2025-07-17Safeguarding Federated Learning-based Road Condition Classification2025-07-16Towards Autonomous Riding: A Review of Perception, Planning, and Control in Intelligent Two-Wheelers2025-07-16