Wei Li, Xiatian Zhu, Shaogang Gong
Existing person re-identification (re-id) methods either assume the availability of well-aligned person bounding box images as model input or rely on constrained attention selection mechanisms to calibrate misaligned images. They are therefore sub-optimal for re-id matching in arbitrarily aligned person images potentially with large human pose variations and unconstrained auto-detection errors. In this work, we show the advantages of jointly learning attention selection and feature representation in a Convolutional Neural Network (CNN) by maximising the complementary information of different levels of visual attention subject to re-id discriminative learning constraints. Specifically, we formulate a novel Harmonious Attention CNN (HA-CNN) model for joint learning of soft pixel attention and hard regional attention along with simultaneous optimisation of feature representations, dedicated to optimise person re-id in uncontrolled (misaligned) images. Extensive comparative evaluations validate the superiority of this new HA-CNN model for person re-id over a wide variety of state-of-the-art methods on three large-scale benchmarks including CUHK03, Market-1501, and DukeMTMC-ReID.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Person Re-Identification | CUHK03 detected | MAP | 38.6 | HA-CNN (CVPR'18) |
| Person Re-Identification | CUHK03 detected | Rank-1 | 41.7 | HA-CNN (CVPR'18) |
| Person Re-Identification | CUHK03 labeled | MAP | 41 | HA-CNN (CVPR'18) |
| Person Re-Identification | CUHK03 labeled | Rank-1 | 44.4 | HA-CNN (CVPR'18) |
| Person Re-Identification | CUHK03 | MAP | 38.6 | HA-CNN |
| Person Re-Identification | CUHK03 | Rank-1 | 41.7 | HA-CNN |