Yongming Rao, Guangyi Chen, Jiwen Lu, Jie zhou
Attention mechanism has demonstrated great potential in fine-grained visual recognition tasks. In this paper, we present a counterfactual attention learning method to learn more effective attention based on causal inference. Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process. Specifically, we analyze the effect of the learned visual attention on network prediction through counterfactual intervention and maximize the effect to encourage the network to learn more useful attention for fine-grained image recognition. Empirically, we evaluate our method on a wide range of fine-grained recognition tasks where attention plays a crucial role, including fine-grained image categorization, person re-identification, and vehicle re-identification. The consistent improvement on all benchmarks demonstrates the effectiveness of our method. Code is available at https://github.com/raoyongming/CAL
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Person Re-Identification | MSMT17 | Rank-1 | 84.2 | CAL(ResNet50) |
| Person Re-Identification | MSMT17 | mAP | 64 | CAL(ResNet50) |
| Person Re-Identification | Market-1501 | Rank-1 | 95.5 | CAL |
| Person Re-Identification | Market-1501 | mAP | 89.5 | CAL |
| Person Re-Identification | DukeMTMC-reID | Rank-1 | 90 | CAL |
| Person Re-Identification | DukeMTMC-reID | mAP | 80.5 | CAL |
| Few-Shot Learning | Stanford Cars | 12-shot Accuracy | 82.9 | CAL |
| Few-Shot Learning | Stanford Cars | 16-shot Accuracy | 88.9 | CAL |
| Few-Shot Learning | Stanford Cars | 4-shot Accuracy | 42.2 | CAL |
| Few-Shot Learning | Stanford Cars | 8-shot Accuracy | 71.8 | CAL |
| Few-Shot Learning | FGVC Aircraft | 12-shot Accuracy | 67.6 | CAL |
| Few-Shot Learning | FGVC Aircraft | 16-shot Accuracy | 74.3 | CAL |
| Few-Shot Learning | FGVC Aircraft | 4-shot Accuracy | 35.2 | CAL |
| Few-Shot Learning | FGVC Aircraft | 8-shot Accuracy | 55.4 | CAL |
| Few-Shot Learning | FGVC Aircraft | Harmonic mean | 35.2 | CAL |
| Few-Shot Learning | DTD | 12-shot Accuracy | 54.6 | CAL |
| Few-Shot Learning | DTD | 16-shot Accuracy | 57.4 | CAL |
| Few-Shot Learning | DTD | 4-shot Accuracy | 40.9 | CAL |
| Few-Shot Learning | DTD | 8-shot Accuracy | 50.4 | CAL |
| Image Classification | FGVC Aircraft | Accuracy | 94.2 | CAL |
| Image Classification | CUB-200-2011 | Accuracy | 90.6 | CAL |
| Intelligent Surveillance | VehicleID Large | Rank-1 | 75.1 | CAL |
| Intelligent Surveillance | VehicleID Large | mAP | 80.9 | CAL |
| Intelligent Surveillance | VehicleID Medium | Rank-1 | 78.2 | CAL |
| Intelligent Surveillance | VehicleID Medium | mAP | 83.8 | CAL |
| Intelligent Surveillance | VeRi-776 | Rank-1 | 95.4 | CAL |
| Intelligent Surveillance | VeRi-776 | Rank5 | 97.9 | CAL |
| Intelligent Surveillance | VeRi-776 | mAP | 74.3 | CAL |
| Intelligent Surveillance | VehicleID Small | Rank-1 | 82.5 | CAL |
| Intelligent Surveillance | VehicleID Small | mAP | 87.8 | CAL |
| Fine-Grained Image Classification | FGVC Aircraft | Accuracy | 94.2 | CAL |
| Fine-Grained Image Classification | CUB-200-2011 | Accuracy | 90.6 | CAL |
| Meta-Learning | Stanford Cars | 12-shot Accuracy | 82.9 | CAL |
| Meta-Learning | Stanford Cars | 16-shot Accuracy | 88.9 | CAL |
| Meta-Learning | Stanford Cars | 4-shot Accuracy | 42.2 | CAL |
| Meta-Learning | Stanford Cars | 8-shot Accuracy | 71.8 | CAL |
| Meta-Learning | FGVC Aircraft | 12-shot Accuracy | 67.6 | CAL |
| Meta-Learning | FGVC Aircraft | 16-shot Accuracy | 74.3 | CAL |
| Meta-Learning | FGVC Aircraft | 4-shot Accuracy | 35.2 | CAL |
| Meta-Learning | FGVC Aircraft | 8-shot Accuracy | 55.4 | CAL |
| Meta-Learning | FGVC Aircraft | Harmonic mean | 35.2 | CAL |
| Meta-Learning | DTD | 12-shot Accuracy | 54.6 | CAL |
| Meta-Learning | DTD | 16-shot Accuracy | 57.4 | CAL |
| Meta-Learning | DTD | 4-shot Accuracy | 40.9 | CAL |
| Meta-Learning | DTD | 8-shot Accuracy | 50.4 | CAL |
| Vehicle Re-Identification | VehicleID Large | Rank-1 | 75.1 | CAL |
| Vehicle Re-Identification | VehicleID Large | mAP | 80.9 | CAL |
| Vehicle Re-Identification | VehicleID Medium | Rank-1 | 78.2 | CAL |
| Vehicle Re-Identification | VehicleID Medium | mAP | 83.8 | CAL |
| Vehicle Re-Identification | VeRi-776 | Rank-1 | 95.4 | CAL |
| Vehicle Re-Identification | VeRi-776 | Rank5 | 97.9 | CAL |
| Vehicle Re-Identification | VeRi-776 | mAP | 74.3 | CAL |
| Vehicle Re-Identification | VehicleID Small | Rank-1 | 82.5 | CAL |
| Vehicle Re-Identification | VehicleID Small | mAP | 87.8 | CAL |
| Classification | FGVC Aircraft | OOD Accuracy (%) | 25.1 | CAL + ALIA |
| Classification | FGVC Aircraft | Top-1 Accuracy (%) | 71.8 | CAL + ALIA |
| Classification | FGVC Aircraft | OOD Accuracy (%) | 10.2 | CAL |
| Classification | FGVC Aircraft | Top-1 Accuracy (%) | 71 | CAL |