TasksSotADatasetsPapersMethodsSubmitAbout
Papers With Code 2

A community resource for machine learning research: papers, code, benchmarks, and state-of-the-art results.

Explore

Notable BenchmarksAll SotADatasetsPapersMethods

Community

Submit ResultsAbout

Data sourced from the PWC Archive (CC-BY-SA 4.0). Built by the community, for the community.

Papers/Context-aware Attentional Pooling (CAP) for Fine-grained V...

Context-aware Attentional Pooling (CAP) for Fine-grained Visual Classification

Ardhendu Behera, Zachary Wharton, Pradeep Hewage, Asish Bera

2021-01-17InformativenessGeneral ClassificationFine-Grained Image Classification
PaperPDFCode

Abstract

Deep convolutional neural networks (CNNs) have shown a strong ability in mining discriminative object pose and parts information for image recognition. For fine-grained recognition, context-aware rich feature representation of object/scene plays a key role since it exhibits a significant variance in the same subcategory and subtle variance among different subcategories. Finding the subtle variance that fully characterizes the object/scene is not straightforward. To address this, we propose a novel context-aware attentional pooling (CAP) that effectively captures subtle changes via sub-pixel gradients, and learns to attend informative integral regions and their importance in discriminating different subcategories without requiring the bounding-box and/or distinguishable part annotations. We also introduce a novel feature encoding by considering the intrinsic consistency between the informativeness of the integral regions and their spatial structures to capture the semantic correlation among them. Our approach is simple yet extremely effective and can be easily applied on top of a standard classification backbone network. We evaluate our approach using six state-of-the-art (SotA) backbone networks and eight benchmark datasets. Our method significantly outperforms the SotA approaches on six datasets and is very competitive with the remaining two.

Results

TaskDatasetMetricValueModel
Person Re-IdentificationDukeMTMC-reIDRank-1091.8CAP
Image ClassificationFGVC AircraftPARAMS34.2CAP
Image ClassificationFood-101Accuracy98.6CAP
Image ClassificationFood-101PARAMS34.2CAP
Image ClassificationCUB-200-2011Accuracy91.8CAP
Fine-Grained Image ClassificationFGVC AircraftPARAMS34.2CAP
Fine-Grained Image ClassificationFood-101Accuracy98.6CAP
Fine-Grained Image ClassificationFood-101PARAMS34.2CAP
Fine-Grained Image ClassificationCUB-200-2011Accuracy91.8CAP

Related Papers

Multi-Agent Retrieval-Augmented Framework for Evidence-Based Counterspeech Against Health Misinformation2025-07-09LumiCRS: Asymmetric Contrastive Prototype Learning for Long-Tail Conversational Movie Recommendation2025-07-07Dynamic Bandwidth Allocation for Hybrid Event-RGB Transmission2025-06-25Hierarchical Mask-Enhanced Dual Reconstruction Network for Few-Shot Fine-Grained Image Classification2025-06-25Multi-Preference Lambda-weighted Listwise DPO for Dynamic Preference Alignment2025-06-24Structural feature enhanced transformer for fine-grained image recognition2025-06-14GPLQ: A General, Practical, and Lightning QAT Method for Vision Transformers2025-06-13CuRe: Cultural Gaps in the Long Tail of Text-to-Image Systems2025-06-09