Ao Zhang, Yuan YAO, Qianyu Chen, Wei Ji, Zhiyuan Liu, Maosong Sun, Tat-Seng Chua
Scene graph generation (SGG) is designed to extract (subject, predicate, object) triplets in images. Recent works have made a steady progress on SGG, and provide useful tools for high-level vision and language understanding. However, due to the data distribution problems including long-tail distribution and semantic ambiguity, the predictions of current SGG models tend to collapse to several frequent but uninformative predicates (e.g., on, at), which limits practical application of these models in downstream tasks. To deal with the problems above, we propose a novel Internal and External Data Transfer (IETrans) method, which can be applied in a plug-and-play fashion and expanded to large SGG with 1,807 predicate classes. Our IETrans tries to relieve the data distribution problem by automatically creating an enhanced dataset that provides more sufficient and coherent annotations for all predicates. By training on the enhanced dataset, a Neural Motif model doubles the macro performance while maintaining competitive micro performance. The code and data are publicly available at https://github.com/waxnkw/IETrans-SGG.pytorch.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Scene Parsing | Visual Genome | Recall@100 | 27.2 | IETrans |
| Scene Parsing | Visual Genome | Recall@50 | 23.5 | IETrans |
| Scene Parsing | Visual Genome | mean Recall @100 | 18 | IETrans |
| Scene Parsing | Visual Genome | F@100 | 44.1 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Parsing | Visual Genome | mR@20 | 28.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Parsing | Visual Genome | ng-mR@20 | 36 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Parsing | Visual Genome | F@100 | 26 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Parsing | Visual Genome | mR@20 | 17.5 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Parsing | Visual Genome | ng-mR@20 | 21.8 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Parsing | Visual Genome | F@100 | 21.7 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| Scene Parsing | Visual Genome | mR@20 | 10.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| Scene Parsing | Visual Genome | ng-mR@20 | 13.4 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| 2D Semantic Segmentation | Visual Genome | Recall@100 | 27.2 | IETrans |
| 2D Semantic Segmentation | Visual Genome | Recall@50 | 23.5 | IETrans |
| 2D Semantic Segmentation | Visual Genome | mean Recall @100 | 18 | IETrans |
| 2D Semantic Segmentation | Visual Genome | F@100 | 44.1 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| 2D Semantic Segmentation | Visual Genome | mR@20 | 28.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| 2D Semantic Segmentation | Visual Genome | ng-mR@20 | 36 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| 2D Semantic Segmentation | Visual Genome | F@100 | 26 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| 2D Semantic Segmentation | Visual Genome | mR@20 | 17.5 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| 2D Semantic Segmentation | Visual Genome | ng-mR@20 | 21.8 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| 2D Semantic Segmentation | Visual Genome | F@100 | 21.7 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| 2D Semantic Segmentation | Visual Genome | mR@20 | 10.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| 2D Semantic Segmentation | Visual Genome | ng-mR@20 | 13.4 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| Scene Graph Generation | Visual Genome | Recall@100 | 27.2 | IETrans |
| Scene Graph Generation | Visual Genome | Recall@50 | 23.5 | IETrans |
| Scene Graph Generation | Visual Genome | mean Recall @100 | 18 | IETrans |
| Scene Graph Generation | Visual Genome | F@100 | 44.1 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Graph Generation | Visual Genome | mR@20 | 28.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Graph Generation | Visual Genome | ng-mR@20 | 36 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; PredCls mode) |
| Scene Graph Generation | Visual Genome | F@100 | 26 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Graph Generation | Visual Genome | mR@20 | 17.5 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Graph Generation | Visual Genome | ng-mR@20 | 21.8 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGCls mode) |
| Scene Graph Generation | Visual Genome | F@100 | 21.7 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| Scene Graph Generation | Visual Genome | mR@20 | 10.9 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |
| Scene Graph Generation | Visual Genome | ng-mR@20 | 13.4 | IETrans (MOTIFS-ResNeXt-101-FPN backbone; SGDet mode) |