PERO layout dataset
ImagesIntroduced 2021-02-23
We compiled a new dataset (the PERO layout dataset) that contains 683 images from various sources and historical periods with complete manual text block, text line polygon and baseline annotations. The included documents range from handwritten letters to historic printed books and newspapers and contain various languages including Arabic and Russian. Part of the PERO dataset was collected from existing datasets and extended with additional layout annotations (cBAD, IMPACT and BADAM). The dataset is split into 456 training and 227 testing images.