Nihal V. Nayak, Stephen H. Bach
Zero-shot learning relies on semantic class representations such as hand-engineered attributes or learned embeddings to predict classes without any labeled examples. We propose to learn class representations by embedding nodes from common sense knowledge graphs in a vector space. Common sense knowledge graphs are an untapped source of explicit high-level knowledge that requires little human effort to apply to a range of tasks. To capture the knowledge in the graph, we introduce ZSL-KG, a general-purpose framework with a novel transformer graph convolutional network (TrGCN) for generating class representations. Our proposed TrGCN architecture computes non-linear combinations of node neighbourhoods. Our results show that ZSL-KG improves over existing WordNet-based methods on five out of six zero-shot benchmark datasets in language and vision.
| Task | Dataset | Metric | Value | Model |
|---|---|---|---|---|
| Zero-Shot Learning | SNIPS | Accuracy | 88.98 | ZSL-KG |
| Zero-Shot Learning | aPY - 0-Shot | Top-1 | 60.54 | ZSL-KG |
| Zero-Shot Learning | AwA2 | average top-1 classification accuracy | 78.08 | ZSL-KG |
| Zero-Shot Learning | AwA2 | Harmonic mean | 74.58 | ZSL-KG |
| Zero-Shot Learning | BBN Pronoun Coreference and Entity Type Corpus | F1 | 26.69 | ZSL-KG |
| Zero-Shot Learning | aPY - 0-Shot | Harmonic mean | 61.57 | ZSL-KG |
| Zero-Shot Learning | OntoNotes | F1 | 45.21 | ZSL-KG |