PMC-OA

PubmedCentral OpenAcess

Introduced 2023-03-13

PMC-OA is a large-scale dataset that contains 1.65M image-text pairs. The figures and captions from PubMed Central, 2,478,267 available papers are covered and 12,211,907 figure-caption pairs are extracted.

Source:PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents