Semi-iNat

Semi-Supervised iNaturalist

ImagesUnknownIntroduced 2021-06-02

Semi-iNat is a challenging dataset for semi-supervised classification with a long-tailed distribution of classes, fine-grained categories, and domain shifts between labeled and unlabeled data. The data is obtained from iNaturalist, a community driven project aimed at collecting observations of biodiversity.

The dataset comes with standard training, validation and test sets. The training set consists of:

  • labeled images from 810 species, where around 10% of the images are labeled.

  • unlabeled images contains unlabeled images from the same set of classes as the labeled images (in-class), plus the images from a different set of classes as the labeled set (out-of-class). The species are guaranteed to have species at the same phylum level in the labels set. This reflects a common scenario where a coarser taxonomic label of an image can be easily obtained.