Non-Hierarchical Transformers for Pedestrian Segmentation
Amani Kiruga, Xi Peng
Abstract
We propose a methodology to address the challenge of instance segmentation in autonomous systems, specifically targeting accessibility and inclusivity. Our approach utilizes a non-hierarchical Vision Transformer variant, EVA-02, combined with a Cascade Mask R-CNN mask head. Through fine-tuning on the AVA instance segmentation challenge dataset, we achieved a promising mean Average Precision (mAP) of 52.68\% on the test set. Our results demonstrate the efficacy of ViT-based architectures in enhancing vision capabilities and accommodating the unique needs of individuals with disabilities.
Related Papers
SeC: Advancing Complex Video Object Segmentation via Progressive Concept Construction2025-07-21Deep Learning-Based Fetal Lung Segmentation from Diffusion-weighted MRI Images and Lung Maturity Evaluation for Fetal Growth Restriction2025-07-17DiffOSeg: Omni Medical Image Segmentation via Multi-Expert Collaboration Diffusion Model2025-07-17From Variability To Accuracy: Conditional Bernoulli Diffusion Models with Consensus-Driven Correction for Thin Structure Segmentation2025-07-17Unleashing Vision Foundation Models for Coronary Artery Segmentation: Parallel ViT-CNN Encoding and Variational Fusion2025-07-17SCORE: Scene Context Matters in Open-Vocabulary Remote Sensing Instance Segmentation2025-07-17Unified Medical Image Segmentation with State Space Modeling Snake2025-07-17A Privacy-Preserving Semantic-Segmentation Method Using Domain-Adaptation Technique2025-07-17