RuFa
ImagesIntroduced 2020-07-17
RuFa (Ruqaa-Farsi) dataset contains images of text written in one of two Arabic fonts: Ruqaa and Nastaliq (Farsi). The dataset contains 40,000 synthesized image and 516 real one, 40,516 in total. Images are in RGB JPG format at 100×100px. Text in the images has varying number of words, position, size, and opacity.
Real images were extracted from:
-
“The Rules of Arabic Calligraphy” by Hashem Al-Khatat - 1986.
-
“Ottman Fonts” by Muhammad Amin Osmanli Ketbkhana.
The synthetization process is described in detail in this post.
Dataset folder structure:
/rufa (40,516 images)
-
/real (516 images)
* /ruqaa (260 images) * /farsi (256 images) -
/synth (40,000 images)
* /ruqaa (20,000 images) * /farsi (20,000 images)